首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
A heuristic algorithm for finding gene transmission patterns on large and complex pedigrees with partially observed genotype data is proposed. The method can be used to generate an initial point for a Markov chain Monte Carlo simulation or to check that the given pedigree and the genotype data are consistent. In small pedigrees, the algorithm is exact by exhaustively enumerating all possibilities, but, in large pedigrees, with a considerable amount of unknown data, only a subset of promising configurations can actually be checked. For that purpose, the configurations are ordered by combining the approximative conditional probability distribution of the unknown genotypes with the information on the relationships between individuals. We also introduce a way to divide the task into subparts, which has been shown to be useful in large pedigrees. The algorithm has been implemented in a program called APE (Allelic Path Explorer) and tested in three different settings with good results.  相似文献   

2.
Alun Thomas 《Zoo biology》1990,9(4):259-274
A comparison is made between a much used simulation method, commonly called gene dropping, and the exact computational technique of peeling. These methods are illustrated using the problem of finding the distribution of the number of distinct ancestral genes surviving at an autosomal locus. Each method is used on several real zoo pedigrees, of varying size and complexity, and the results are compared. Gene dropping is found to be a good approximation to peeling, but for all but the most complex pedigrees surveyed, peeling is preferable. The relationship between heterozygosity and allelic variability is investigated.  相似文献   

3.
The objective of this study was to estimate the myostatin (mh) gene's effect on milk, protein and fat yield in a large heterogeneous cow population, of which only a small portion was genotyped. For this purpose, a total of 13 992 889 test-day records derived from 799 778 cows were available. The mh gene effect was estimated via BLUP using a multi-lactation, multi-trait random regression test-day model with an additional fixed regression on mh gene content. As only 1416 animals, (of which 1183 cows had test-day records) were genotyped, more animals of additional breeds with assumed known genotype were added to estimate the genotype (gene content) of the remaining cows more reliably. This was carried out using the conventional pedigree information between genotyped animals and their non-genotyped relatives. Applying this rule, mean estimated gene content over all cows with test-day records was 0.104, showing that most cows were homozygous +/+. In contrast, when gene content estimation was only based on genotyped animals, mean estimated gene content over all cows with test-day records was with 1.349 overestimated. Therefore, the applied method for gene content estimation in large populations needs additional genotype assumptions about additional animals representing genetic diversity when the breed composition in the complete population is heterogeneous and only a few animals from predominantly one breed are genotyped. Concerning allele substitution effects for one copy of the 'mh' gene variant, significant decreases of -76.1 kg milk, -3.6 kg fat and -2.8 kg protein/lactation were obtained on average when gene content estimation was additionally based on animals with assumed known genotype. Based on this result, knowledge of the mh genotypes and their effects has the potential to improve milk performance traits in cattle.  相似文献   

4.
Constraint-based structure learning algorithms generally perform well on sparse graphs. Although sparsity is not uncommon, there are some domains where the underlying graph can have some dense regions; one of these domains is gene regulatory networks, which is the main motivation to undertake the study described in this paper. We propose a new constraint-based algorithm that can both increase the quality of output and decrease the computational requirements for learning the structure of gene regulatory networks. The algorithm is based on and extends the PC algorithm. Two different types of information are derived from the prior knowledge; one is the probability of existence of edges, and the other is the nodes that seem to be dependent on a large number of nodes compared to other nodes in the graph. Also a new method based on Gene Ontology for gene regulatory network validation is proposed. We demonstrate the applicability and effectiveness of the proposed algorithms on both synthetic and real data sets.  相似文献   

5.
《Genomics》2020,112(1):114-126
Gene expression data are expected to make a great contribution in the producing of efficient cancer diagnosis and prognosis. Gene expression data are coded by large measured genes, and only of a few number of them carry precious information for different classes of samples. Recently, several researchers proposed gene selection methods based on metaheuristic algorithms for analysing and interpreting gene expression data. However, due to large number of selected genes with limited number of patient's samples and complex interaction between genes, many gene selection methods experienced challenges in order to approach the most relevant and reliable genes. Hence, in this paper, a hybrid filter/wrapper, called rMRMR-MBA is proposed for gene selection problem. In this method, robust Minimum Redundancy Maximum Relevancy (rMRMR) as filter to select the most promising genes and an modified bat algorithm (MBA) as search engine in wrapper approach is proposed to identify a small set of informative genes. The performance of the proposed method has been evaluated using ten gene expression datasets. For performance evaluation, MBA is evaluated by studying the convergence behaviour of MBA with and without TRIZ optimisation operators. For comparative evaluation, the results of the proposed rMRMR-MBA were compared against ten state-of-arts methods using the same datasets. The comparative study demonstrates that the proposed method produced better results in terms of classification accuracy and number of selected genes in two out of ten datasets and competitive results on the remaining datasets. In a nutshell, the proposed method is able to produce very promising results with high classification accuracy which can be considered a promising contribution for gene selection domain.  相似文献   

6.
Candidate gene identification is typically labour intensive, involving laboratory experiments required to corroborate or disprove any hypothesis for a nominated candidate gene being considered the causative gene. The traditional approach to reduce the number of candidate genes entails fine-mapping studies using markers and pedigrees. Gene prioritization establishes the ranking of candidate genes based on their relevance to the biological process of interest, from which the most promising genes can be selected for further analysis. To date, many computational methods have focused on the prediction of candidate genes by analysis of their inherent sequence characteristics and similarity with respect to known disease genes, as well as their functional annotation. In the last decade, several computational tools for prioritizing candidate genes have been proposed. A large number of them are web-based tools, while others are standalone applications that install and run locally. This review attempts to take a close look at gene prioritization criteria, as well as candidate gene prioritization algorithms, and thus provide a comprehensive synopsis of the subject matter.  相似文献   

7.
Fang Z  Du R  Cui X 《PloS one》2012,7(2):e31505
Gene set analysis is widely used to facilitate biological interpretations in the analyses of differential expression from high throughput profiling data. Wilcoxon Rank-Sum (WRS) test is one of the commonly used methods in gene set enrichment analysis. It compares the ranks of genes in a gene set against those of genes outside the gene set. This method is easy to implement and it eliminates the dichotomization of genes into significant and non-significant in a competitive hypothesis testing. Due to the large number of genes being examined, it is impractical to calculate the exact null distribution for the WRS test. Therefore, the normal distribution is commonly used as an approximation. However, as we demonstrate in this paper, the normal approximation is problematic when a gene set with relative small number of genes is tested against the large number of genes in the complementary set. In this situation, a uniform approximation is substantially more powerful, more accurate, and less intensive in computation. We demonstrate the advantage of the uniform approximations in Gene Ontology (GO) term analysis using simulations and real data sets.  相似文献   

8.
This paper introduces a likelihood method of estimating ethnic admixture that uses individuals, pedigrees, or a combination of individuals and pedigrees. For each founder of a pedigree, admixture proportions are calculated by conditioning on the pedigree-wide genotypes at all ancestry-informative markers. These estimates are then propagated down the pedigree to the nonfounders by a simple averaging process. The large-sample standard errors of the founders' proportions can be similarly transformed into standard errors for the admixture proportions of the descendants. These standard errors are smaller than the corresponding standard errors when each individual is treated independently. Both hard and soft information on a founder's ancestry can be accommodated in this scheme, which has been implemented in the genetic software package Mendel. The utility of the method is demonstrated on simulated data and a real data example involving Mexican families of mixed Amerindian and Spanish ancestry.  相似文献   

9.
It has been increasingly recognized that incorporating prior knowledge into cluster analysis can result in more reliable and meaningful clusters. In contrast to the standard modelbased clustering with a global mixture model, which does not use any prior information, a stratified mixture model was recently proposed to incorporate gene functions or biological pathways as priors in model-based clustering of gene expression profiles: various gene functional groups form the strata in a stratified mixture model. Albeit useful, the stratified method may be less efficient than the global analysis if the strata are non-informative to clustering. We propose a weighted method that aims to strike a balance between a stratified analysis and a global analysis: it weights between the clustering results of the stratified analysis and that of the global analysis; the weight is determined by data. More generally, the weighted method can take advantage of the hierarchical structure of most existing gene functional annotation systems, such as MIPS and Gene Ontology (GO), and facilitate choosing appropriate gene functional groups as priors. We use simulated data and real data to demonstrate the feasibility and advantages of the proposed method.  相似文献   

10.
Gao G  Hoeschele I 《Genetics》2005,171(1):365-376
Identity-by-descent (IBD) matrix calculation is an important step in quantitative trait loci (QTL) analysis using variance component models. To calculate IBD matrices efficiently for large pedigrees with large numbers of loci, an approximation method based on the reconstruction of haplotype configurations for the pedigrees is proposed. The method uses a subset of haplotype configurations with high likelihoods identified by a haplotyping method. The new method is compared with a Markov chain Monte Carlo (MCMC) method (Loki) in terms of QTL mapping performance on simulated pedigrees. Both methods yield almost identical results for the estimation of QTL positions and variance parameters, while the new method is much more computationally efficient than the MCMC approach for large pedigrees and large numbers of loci. The proposed method is also compared with an exact method (Merlin) in small simulated pedigrees, where both methods produce nearly identical estimates of position-specific kinship coefficients. The new method can be used for fine mapping with joint linkage disequilibrium and linkage analysis, which improves the power and accuracy of QTL mapping.  相似文献   

11.
MOTIVATION: Gene set analysis allows formal testing of subtle but coordinated changes in a group of genes, such as those defined by Gene Ontology (GO) or KEGG Pathway databases. We propose a new method for gene set analysis that is based on principal component analysis (PCA) of genes expression values in the gene set. PCA is an effective method for reducing high dimensionality and capture variations in gene expression values. However, one limitation with PCA is that the latent variable identified by the first PC may be unrelated to outcome. RESULTS: In the proposed supervised PCA (SPCA) model for gene set analysis, the PCs are estimated from a selected subset of genes that are associated with outcome. As outcome information is used in the gene selection step, this method is supervised, thus called the Supervised PCA model. Because of the gene selection step, test statistic in SPCA model can no longer be approximated well using t-distribution. We propose a two-component mixture distribution based on Gumbel exteme value distributions to account for the gene selection step. We show the proposed method compares favorably to currently available gene set analysis methods using simulated and real microarray data. SOFTWARE: The R code for the analysis used in this article are available upon request, we are currently working on implementing the proposed method in an R package.  相似文献   

12.
13.
Summary A Bayesian method to estimate genotype probabilities at a single locus using information on the individual and all its relatives and their mates has been developed. The method uses data over several generations, can deal with large numbers of individuals in large livestock families and allows for missing information. It can be extended to multiple alleles and can be used for autosomal or sex-linked loci. The allele frequencies and the form of expression (dominance, penetrance) must be specified. An algorithm using the method and involving an iterative procedure has been developed to calculate the genotype probabilities for practical use in livestock breeding. The method and algorithm were used to determine the accuracy of estimating genotype probabilities of sires for a female sex-limited trait, such as genetic variants of milk proteins. Data were similated and genotype probabilities estimated for 100 sires (20 replicates) with 3, 6 and 12 female offspring per sire, for different population frequencies, for additive and dominance gene action and for variable genotypic expression. Such simulation is useful in the design of testing systems for the use of information on specific genetic loci in selection.Prepared during a leave at Centre for Genetic Improvement of Livestock, Guelph, Canada  相似文献   

14.
Toward a theory of marker-assisted gene pyramiding   总被引:9,自引:0,他引:9  
Servin B  Martin OC  Mézard M  Hospital F 《Genetics》2004,168(1):513-523
We investigate the best way to combine into a single genotype a series of target genes identified in different parents (gene pyramiding). Assuming that individuals can be selected and mated according to their genotype, the best method corresponds to an optimal succession of crosses over several generations (pedigree). For each pedigree, we compute the probability of success from the known recombination fractions between the target loci, as well as the number of individuals (population sizes) that should be genotyped over successive generations until the desired genotype is obtained. We provide an algorithm that generates and compares pedigrees on the basis of the population sizes they require and on their total duration (in number of generations) and finds the best gene-pyramiding scheme. Examples are given for eight target genes and are compared to a reference genotype selection method with random mating. The best gene-pyramiding method combines the eight targets in three generations less than the reference method while requiring fewer genotypings.  相似文献   

15.
A method was derived to estimate effects of quantitative trait loci (QTL) using incomplete genotype information in large outbreeding populations with complex pedigrees. The method accounts for background genes by estimating polygenic effects. The basic equations used are very similar to the usual linear mixed model equations for polygenic models, and segregation analysis was used to estimate the probabilities of the QTL genotypes for each animal. Method R was used to estimate the polygenic heritability simultaneously with the QTL effects. Also, initial allele frequencies were estimated. The method was tested in a simulated data set of 10,000 animals evenly distributed over 10 generations, where 0, 400 or 10,000 animals were genotyped for a candidate gene. In the absence of selection, the bias of the QTL estimates was <2%. Selection biased the estimate of the Aa genotype slightly, when zero animals were genotyped. Estimates of the polygenic heritability were 0.251 and 0.257, in absence and presence of selection, respectively, while the simulated value was 0.25. Although not tested in this study, marker information could be accommodated by adjusting the transmission probabilities of the genotypes from parent to offspring according to the marker information. This renders a QTL mapping study in large multi-generation pedigrees possible.  相似文献   

16.
Linkage mapping has been extensively applied in the murine and human genomes. It remains a powerful approach to mapping genes and identifying genetic variants. As genome efforts identify large numbers of single-nucleotide polymorphisms, it will be critical to validate these polymorphisms and confirm their gene assignment and chromosomal location. The presence of pseudogenes can confuse such efforts. We have used denaturing HPLC to identify polymorphisms in human genes and to genotype individuals in selected CEPH pedigrees. The same approach has been applied to the mapping of murine genes in interspecies backcross animals. This strategy is rapid, accurate and superior in several respects to other technologies.  相似文献   

17.
Baruch E  Weller JI  Cohen-Zinder M  Ron M  Seroussi E 《Genetics》2006,172(3):1757-1765
We present a simple algorithm for reconstruction of haplotypes from a sample of multilocus genotypes. The algorithm is aimed specifically for analysis of very large pedigrees for small chromosomal segments, where recombination frequency within the chromosomal segment can be assumed to be zero. The algorithm was tested both on simulated pedigrees of 155 individuals in a family structure of three generations and on real data of 1149 animals from the Israeli Holstein dairy cattle population, including 406 bulls with genotypes, but no females with genotypes. The rate of haplotype resolution for the simulated data was >91% with a standard deviation of 2%. With 20% missing data, the rate of haplotype resolution was 67.5% with a standard deviation of 1.3%. In both cases all recovered haplotypes were correct. In the real data, allele origin was resolved for 22% of the heterozygous genotypes, even though 70% of the genotypes were missing. Haplotypes were resolved for 36% of the males. Computing time was insignificant for both data sets. Despite the intricacy of large-scale real pedigree genotypes, the proposed algorithm provides a practical rule-based solution for resolving haplotypes for small chromosomal segments in commercial animal populations.  相似文献   

18.
An increased availability of genotypes at marker loci has prompted the development of models that include the effect of individual genes. Selection based on these models is known as marker-assisted selection (MAS). MAS is known to be efficient especially for traits that have low heritability and non-additive gene action. BLUP methodology under non-additive gene action is not feasible for large inbred or crossbred pedigrees. It is easy to incorporate non-additive gene action in a finite locus model. Under such a model, the unobservable genotypic values can be predicted using the conditional mean of the genotypic values given the data. To compute this conditional mean, conditional genotype probabilities must be computed. In this study these probabilities were computed using iterative peeling, and three Markov chain Monte Carlo (MCMC) methods – scalar Gibbs, blocking Gibbs, and a sampler that combines the Elston Stewart algorithm with iterative peeling (ESIP). The performance of these four methods was assessed using simulated data. For pedigrees with loops, iterative peeling fails to provide accurate genotype probability estimates for some pedigree members. Also, computing time is exponentially related to the number of loci in the model. For MCMC methods, a linear relationship can be maintained by sampling genotypes one locus at a time. Out of the three MCMC methods considered, ESIP, performed the best while scalar Gibbs performed the worst.  相似文献   

19.
Enrichment analysis of gene sets is a popular approach that provides a functional interpretation of genome-wide expression data. Existing tests are affected by inter-gene correlations, resulting in a high Type I error. The most widely used test, Gene Set Enrichment Analysis, relies on computationally intensive permutations of sample labels to generate a null distribution that preserves gene–gene correlations. A more recent approach, CAMERA, attempts to correct for these correlations by estimating a variance inflation factor directly from the data. Although these methods generate P-values for detecting gene set activity, they are unable to produce confidence intervals or allow for post hoc comparisons. We have developed a new computational framework for Quantitative Set Analysis of Gene Expression (QuSAGE). QuSAGE accounts for inter-gene correlations, improves the estimation of the variance inflation factor and, rather than evaluating the deviation from a null hypothesis with a P-value, it quantifies gene-set activity with a complete probability density function. From this probability density function, P-values and confidence intervals can be extracted and post hoc analysis can be carried out while maintaining statistical traceability. Compared with Gene Set Enrichment Analysis and CAMERA, QuSAGE exhibits better sensitivity and specificity on real data profiling the response to interferon therapy (in chronic Hepatitis C virus patients) and Influenza A virus infection. QuSAGE is available as an R package, which includes the core functions for the method as well as functions to plot and visualize the results.  相似文献   

20.
Microarray data contains a large number of genes (usually more than 1000) and a relatively small number of samples (usually fewer than 100). This presents problems to discriminant analysis of microarray data. One way to alleviate the problem is to reduce dimensionality of data by selecting important genes to the discriminant problem. Gene selection can be cast as a feature selection problem in the context of pattern classification. Feature selection approaches are broadly grouped into filter methods and wrapper methods. The wrapper method outperforms the filter method but at the cost of more intensive computation. In the present study, we proposed a wrapper-like gene selection algorithm based on the Regularization Network. Compared with classical wrapper method, the computational costs in our gene selection algorithm is significantly reduced, because the evaluation criterion we proposed does not demand repeated training in the leave-one-out procedure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号