首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A method for estimating major gene effects using Gibbs sampling to infer genotype of individuals with unknown values, was compared with a standard mixed-model analysis. The purpose of this study was to evaluate the effect of including information of individuals with unknown genotypes on the estimates and their error variances (Ve) of the single-gene effects. When genotypes were known for all the individuals, results using the Gibbs method (GS) were similar to those obtained with the mixed model (MM). In the absence of selection, when information from individuals with unknown genotypes was included, GS yielded unbiased estimates of the major gene effects while reducing the Ve associated with them. This reduction in Ve depended on the gene frequency and mode of action of the major locus. For the additive effect, the reduction in Ve ranged from 29 to 69% of the total reduction which would have been obtained if all individuals had had a known genotype. Similarly the reduction in Ve found for the dominance effect ranged from 12 to 58%. Estimates using GS generally had small detectable biases when the polygenic heritability used in the analysis was inflated or estimated simultaneously. However, the benefit of using information from individuals with unknown genotypes was still maintained when comparing the mean square error of the estimates using either GS or MM when genotypes are only known for a subset of the population. When the population has been under selection, the use of Gibbs sampling to incorporate information of individuals without genotypes reduced substantially the bias and mean square error found for MM analysis on partial data. Nevertheless, there was some bias detected using Gibbs sampling. The gene frequency of the major gene in the base population was also well estimated despite its change over generations due to selection.  相似文献   

2.
The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.  相似文献   

3.
Biclustering microarray data by Gibbs sampling   总被引:1,自引:0,他引:1  
MOTIVATION: Gibbs sampling has become a method of choice for the discovery of noisy patterns, known as motifs, in DNA and protein sequences. Because handling noise in microarray data presents similar challenges, we have adapted this strategy to the biclustering of discretized microarray data. RESULTS: In contrast with standard clustering that reveals genes that behave similarly over all the conditions, biclustering groups genes over only a subset of conditions for which those genes have a sharp probability distribution. We have opted for a simple probabilistic model of the biclusters because it has the key advantage of providing a transparent probabilistic interpretation of the biclusters in the form of an easily interpretable fingerprint. Furthermore, Gibbs sampling does not suffer from the problem of local minima that often characterizes Expectation-Maximization. We demonstrate the effectiveness of our approach on two synthetic data sets as well as a data set from leukemia patients.  相似文献   

4.
5.
6.
Livestock, particularly ruminants, can eat a wider range of biomass than humans. In the drive for greater efficiency, intensive systems of livestock production have evolved to compete with humans for high-energy crops such as cereals. Feeds consumed by livestock were analysed in terms of the quantities used and efficiency of conversion of grassland, human-edible ('edible') crops and crop by-products into milk, meat and eggs, using the United Kingdom as an example of a developed livestock industry. Some 42 million tonnes of forage dry matter were consumed from 2008 to 2009 by the UK ruminant livestock population of which 0.7 was grazed pasture and 0.3 million tonnes was conserved forage. In addition, almost 13 million tonnes of raw material concentrate feeds were used in the UK animal feed industry from 2008 to 2009 of which cereal grains comprised 5.3 and soyabean meal 1.9 million tonnes. The proportion of edible feed in typical UK concentrate formulations ranged from 0.36 for milk production to 0.75 for poultry meat production. Example systems of livestock production were used to calculate feed conversion ratios (FCR - feed input per unit of fresh product). FCR for concentrate feeds was lowest for milk at 0.27 and for the meat systems ranged from 2.3 for poultry meat to 8.8 for cereal beef. Differences in FCR between systems of meat production were smaller when efficiency was calculated on an edible input/output basis, where spring-calving/grass finishing upland suckler beef and lowland lamb production were more efficient than pig and poultry meat production. With the exception of milk and upland suckler beef, FCR for edible feed protein into edible animal protein were >1.0. Edible protein/animal protein FCR of 1.0 may be possible by replacing cereal grain and soyabean meal with cereal by-products in concentrate formulations. It is concluded that by accounting for the proportions of human-edible and inedible feeds used in typical livestock production systems, a more realistic estimate of efficiency can be made for comparisons between systems.  相似文献   

7.
8.
The use of Gibbs sampling in making decisions about the optimal selection environment was demonstrated. Marginal posterior distributions of the efficiency of selection across sites were obtained using the Gibbs sampler, a Bayesian method, from which the probability that the efficiency of selection lay between specified values and the variance of the distribution were computed, providing a lot of information on which to make decisions regarding the location of genetic tests. The heritability, genetic correlations and efficiencies of selection estimated using REML and Gibbs sampling were similar. However, the latter approach showed that the point estimates of the efficiencies of selection were subject to substantial error. The decision regarding selection at maturity was consistent with that obtained using point estimates from REML, but Gibbs sampling allowed the efficiencies of selection to be interpreted with more confidence. The decision regarding early selection differed from that based on REML point estimates. Generally, the decisions to make early selections at site B for planting at both site B and A, and to make selections at maturity at each individual site, were robust to different priors in the Gibbs sampling. Received: 19 June 2000 / Accepted: 18 October 2000  相似文献   

9.
Markov chain Monte Carlo (MCMC) methods have been widely used to overcome computational problems in linkage and segregation analyses. Many variants of this approach exist and are practiced; among the most popular is the Gibbs sampler. The Gibbs sampler is simple to implement but has (in its simplest form) mixing and reducibility problems; furthermore in order to initiate a Gibbs sampling chain we need a starting genotypic or allelic configuration which is consistent with the marker data in the pedigree and which has suitable weight in the joint distribution. We outline a procedure for finding such a configuration in pedigrees which have too many loci to allow for exact peeling. We also explain how this technique could be used to implement a blocking Gibbs sampler.  相似文献   

10.
11.
12.
Analysis of gene expression in single cells   总被引:4,自引:0,他引:4  
A cell's structural and functional characteristics are dependent on the specific complement of genes it expresses. The ability to study and compare gene usage at the cellular level will therefore provide valuable insights into cell physiology. Such analyses are complicated by problems associated with sample collection, sample size and the limited sensitivity of expression assays. Advances have been made in approaches to the collection of cellular material and the performance of single-cell gene expression analysis. Recent development in global amplification of mRNA may soon permit expression analyses of single cells to be performed on DNA microarrays.  相似文献   

13.
14.
Segregation analyses with Gibbs sampling were applied to investigate the mode of inheritance and to estimate the genetic parameters of milk flow of Swiss dairy cattle. The data consisted of 204 397, 655 989 and 40 242 lactation records of milk flow in Brown Swiss, Simmental and Holstein cattle, respectively (4 to 22 years). Separate genetic analyses of first and multiple lactations were carried out for each breed. The results show that genetic parameters especially polygenic variance and heritability of milk flow in the first lactation were very similar under both mixed inheritance (polygenes + major gene) and polygenic models. Segregation analyses yielded very low major gene variances which favour the polygenic determinism of milk flow. Heritabilities and repeatabilities of milk flow in both Brown Swiss and Simmental were high (0.44 to 0.48 and 0.54 to 0.59, respectively). The heritability of milk flow based on scores of milking ability in Holstein was intermediate (0.25). Variance components and heritabilities in the first lactation were slightly larger than those estimates for multiple lactations. The results suggest that milk flow (the quantity of milk per minute of milking) is a relevant measurement to characterise the cows milking ability which is a good candidate trait to be evaluated for a possible inclusion in the selection objectives in dairy cattle.  相似文献   

15.
Estimating single gene effects on quantitative traits   总被引:1,自引:0,他引:1  
Summary Experimental designs for measuring the effects of single loci on quantitative traits are compared for statistical properties. The designs tested are single population, combined strains, multiple strains, diallel of strains, and co-isogenic strains. Testing was done by simulating population genotypic and phenotypic arrays. Statistical properties measured are type I error, power, bias and efficiency. The relative ranking of designs is consistent for all properties and over eight conditions examined. The co-isogenic design is superior, followed closely by the single population method. The other three designs are similar in ability, with the diallel design somewhat superior. Based on its good statistical performance and wide feasibility, the single population method is recommended. The diallel method provides the most information on genetic components of variation.  相似文献   

16.
17.
Kim S  Wang Z  Dalkilic M 《Proteins》2007,66(3):671-681
The motif prediction problem is to predict short, conserved subsequences that are part of a family of sequences, and it is a very important biological problem. Gibbs is one of the first successful motif algorithms and it runs very fast compared with other algorithms, and its search behavior is based on the well-studied Gibbs random sampling. However, motif prediction is a very difficult problem and Gibbs may not predict true motifs in some cases. Thus, the authors explored a possibility of improving the prediction accuracy of Gibbs while retaining its fast runtime performance. In this paper, the authors considered Gibbs only for proteins, not for DNA binding sites. The authors have developed iGibbs, an integrated motif search framework for proteins that employs two previous techniques of their own: one for guiding motif search by clustering sequences and another by pattern refinement. These two techniques are combined to a new double clustering approach to guiding motif search. The unique feature of their framework is that users do not have to specify the number of motifs to be predicted when motifs occur in different subsets of the input sequences since it automatically clusters input sequences into clusters and predict motifs from the clusters. Tests on the PROSITE database show that their framework improved the prediction accuracy of Gibbs significantly. Compared with more exhaustive search methods like MEME, iGibbs predicted motifs more accurately and runs one order of magnitude faster.  相似文献   

18.
MOTIVATION: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. RESULTS: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression. AVAILABILITY: GaneSh, a Java package for coclustering, is available under the terms of the GNU General Public License from our website at http://bioinformatics.psb.ugent.be/software  相似文献   

19.
Multi-trait (co)variance estimation is an important topic in plant and animal breeding. In this study we compare estimates obtained with restricted maximum likelihood (REML) and Bayesian Gibbs sampling of simulated data and of three traits (diameter, height and branch angle) from a 26-year-old partial diallel progeny test of Scots pine (Pinus sylvestris L.). Based on the results from the simulated data we can conclude that the REML estimates are accurate but the mode of posterior distributions from the Gibbs sampling can be overestimated depending on the level of the heritability. The mean and median of the posteriors were considerably higher than the expected values of the heritabilities. The confidence intervals calculated with the delta method were biased downwardly. The highest probablity density (HPD) interval provides a better interval estimate, but could be slightly biased at the lower level. Similar differences between REML and Gibbs sampling estimates were found for the Scots pine data. We conclude that further simulation studies are needed in order to evaluate the effect of different priors on (co)variance components in the genetic individual model.  相似文献   

20.
The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号