首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
A distance function between any two genotypes based on their interaction with environments, when error variances are heterogeneous is derived. Also a dissimilarity index for a group of genotypes which measures their interaction with environments, as an average of the pairwise distance functions is proposed. An iterative relocation algorithm is used to arrive at two different types of clustering based on their interaction with environments making use of the newly defined dissimilarity index.  相似文献   

2.
Selection of test locations for regional trials of barley   总被引:3,自引:0,他引:3  
Summary Three sets of regional six-row barley (Hordeum vulgare L.) trial data, representing cultivar x location x year, were grouped for locations based on the similarity of genotype x environment (GE) interaction. Locations were selected from each group (cluster) so that the structure of the GE interaction generated by the subsets of the locations would be approximately similar to that of the whole set (all locations). The purpose of this paper is to determine the number of locations where the GE interaction structure generated by these selected locations would be fairly consistent over years. Two statistics were used to measure the success of the selected locations: (1) the ratio of GE mean square (MS) associated with the selected location set relative to that associated with the best set (which gives the highest GE interaction MS) and (2) the rank correlation between the cultivar means averaged over the selected locations and those based on the entire data set. The results show that, for eastern Canada, 10–13 locations based on the cluster method can achieve a fairly consistent GE interaction structure over years.Contribution no. R-078 from Research Program Service, Agriculture Canada, and contribution no. 1352 from Plant Research Centre, Central Experimental Farm  相似文献   

3.
Inter-simple sequence repeat (ISSR)-polymerase chain reaction (PCR) polymorphism was generated to provide useful markers for assessment of genetic diversity within flax germplasm collections. We used nine previously selected anchored ISSR primers for fingerprinting of 53 flax cultivars or genotypes and obtained 62 scorable bands, from which 45 bands (72.6%) were polymorphic. An efficient separation of 53 flax accessions into four groups and eight subgroups was achieved using unweighted pair group method with arithmetic means (UPGMA) clustering procedure based on genetic similarity expressed by the Jaccard similarity coefficient (JSC). Clustering procedure within both groups and subgroups successfully produced smaller homogenous clusters, whereas clustering between the main four groups of flax accessions displayed only a continuous decrease of similarity with a weak clustering effect. Statistical significance of grouping and subgrouping within a cluster dendrogram was estimated by calculation of the error flag and cophenetic correlation parameter for each branch. Principal coordinates (PCO) analysis mostly confirmed the separation by UPGMA clustering. We observed a statistically significant correlation between the number of total vs polymorphic bands in ISSR patterns. A one-way analysis of variance (ANOVA) test confirmed statistically significant differences in the average thousand seed mass (TSM) between eight subclusters of flax accessions from an ISSR-PCR-based UPGMA dendrogram, which indicate statistical correlation between flax ISSR polymorphism (the structure of ISSR-based clustering) TSM.  相似文献   

4.
Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent “noise” within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.  相似文献   

5.
Pseudoroegneria spicata (Poaceae: Triticeae) is an abundant, allogamous species widely adapted to the temperate, semiarid steppe and open woodland regions of western North America. Amplified fragment length polymorphism (AFLP), model-based Bayesian clustering, and other methods of hypothesis testing were used to investigate genetic diversity and population structure among 565 P. spicata plants from 82 localities representing much of the species distribution. Comparisons with four Asiatic Pseudoroegneria species and two North American Elymus wawawaiensis accessions demonstrate cohesiveness in P. spicata. However, P. spicata genotypes group by locality and geographic region based on genetic distance analysis. Average DNA polymorphism among P. spicata localities was significantly correlated (r = 0.58) with geographical distance. The optimum Bayesian cluster model included 21 P. spicata groups, indicating that dispersal among sampling locations was not sufficient to group genotypes into one unstructured population. Approximately 18.3% of the DNA polymorphism was partitioned among the 21 regional groups, 14.9% among localities within groups, and 66.8% within accessions. Average DNA polymorphism among Bayesian groups was correlated (r = 0.53) with the average geographic distance among Bayesian groups, which partly reflects isolation by distance. However, conspicuous regional boundaries were discernable among several divergent genetic groups.  相似文献   

6.
Biodiversity information in available germplasm is very useful for the success of any breeding program. To establish genetic diversity among 44 genotypes of chickpea comprising cultigen, landraces, internationally developed improved lines and wild relatives, genetic distances were evaluated using 19 simple sequence repeat markers with 100 marker loci. Estimation of the number of alleles per locus (n a), the effective allele number (n e), and Wright fixation index F were 6.25, 3.67, and 0.44, respectively. Polymorphism information content values ranged from 0.84 (locus NCPGR6 and TA135) to 0.44 (locus NCPGR7) with an average of 0.68. Dice’s coefficient similarity matrix for studied chickpea genotypes varied from 0.07 to 1.0 indicating a broader genetic base among genotypes studied. The highest similarity, 1.0, was observed between genotypes Sel 96TH11484 and Sel 96TH11485; while, the lowest, 0.07, was observed between genotypes Sel 95TH1716 and Azad. Based on the UPGMA clustering method, all genotypes were clustered in eight groups, which indicated the probable origin and region similarity of landraces and local Iran landraces over the other cultivars and wild species. It also represents a wide diversity among available germplasm. Analysis of molecular variance revealed that 41% of the total variance was due to differences among groups while 59% was due to differences within groups. The results of principal coordinate analysis approximately corresponded to those obtained through cluster analysis. Genetic variation detected in this study can be useful for selective breeding for specific traits and in enhancing the genetic base of breeding programs.  相似文献   

7.
AFLP fingerprinting of 45 Indian genotypes of linseed was carried out to determine the genetic relationship among them. Sixteen primer combinations produced 1142 fragments with 1129 as polymorphic and 13 as monomorphic fragments. Polymorphic fragments varied from 44 (E-ACA/M-CTA) to 94 (E-AGC/M-CAC) with an average of 70.6 fragments per primer combination. The frequency of polymorphism varied from 93.7% to 100% with an average of 98.8% across all the genotypes. The PIC value ranged from 0.19 to 0.31 with an average of 0.23 per primer combination. The primer pair E-AGC/M-CAC showed the maximum PIC value (0.31) followed by E-AGC/M-CAG (0.29), E-AAC/M-CAG (0.26) and E-AGC/M-CTA (0.25). Resolving power (RP) and marker index (MI) varied from 13.73 to 43.50 and 8.81 to 28.91 respectively. The Jaccard's similarity coefficient varied from 0.16 to 0.57 with an average of 0.26 ± 0.05. The maximum genetic similarities (57%) were detected between genotypes Him Alsi-1 and Him Alsi-2, followed by Him Alsi-1 and GS41 and GS41 and LC-54. The genotypes R-552, Himani, RKY-14, Meera, Indira Alsi-32 and Suyog were found to be more divergent genotypes. The NJ clustering grouped all the 45 genotypes into three major clusters. In general the genotypes of cluster III had high oil content and those of cluster I had low oil content. At the population level, within population variance was much higher than between populations variance.  相似文献   

8.
提出了一种基于分子标记数据及数量性状基因型值构建作物种质资源核心种质库的方法.采用包括基因型与环境互作的遗传模型及相应的混合线性模型统计分析方法,无偏预测各材料的基因型值,分别用基因型值和分子标记数据计算个体间的相似系数,加权得到最终的相似距离.采用不加权类平均法(UPGMA)进行系统聚类,用多次聚类随机取样法构建核心种质库.以水稻DH群体111个基因型8个农艺性状、175个分子标记位点的数据为实例,按四种抽样比率(25%,20%,15%,10%)构建了四个核心种质库,比较了核心种质库与整个群体的分子标记多样性及数量性状的遗传变异,评价了所用方法的有效性。  相似文献   

9.
A method is introduced to compare results of a clustering technique at different levels of abstraction, or of different clustering techniques. The method emphasizes within cluster homogeneity as well as discontinuities between clusters. It has been derived from Hogeweg's method with some important changes. First each cluster is handled separately to determine the ratio between homogeneity and similarity to the nearest neighbour cluster. For a given clustering a weighted average value is computed over all clusters. This average value is standardized using an expected average value for a cluster configuration with the same number of clusters having the same sizes. A low level of the ratio between expected and observed values is supposed to indicate an optimal clustering. A derivation of the criterion is given and results from three sets of data with different properties are evaluated.  相似文献   

10.
Wheat (Triticum aestivum L.) is a staple food for half of the world. Its productivity and agronomical practices, especially for nitrogen supplementation, is governed by the nitrogen efficiency (NE) of the genotypes. We analyzed 16 popular cultivated Indian varieties of wheat for their NE and variability estimates using a set of 21 simple sequence repeat (SSR) markers, derived from each wheat chromosome. These genotypes were categorized into three groups, viz., low, moderate, and high nitrogen efficient. Of these 16 genotypes, we have reported six, eight, and two genotypes in high, moderate, and low NE categories, respectively. The differential NE in these genotypes was supported by nitrogen uptake and assimilation parameters. The values of average polymorphic information content and marker index for these SSR markers were estimated to be 0.32 and 0.59, respectively. The genetic similarity coefficient for all possible pairs of varieties ranged from 0.41 to 0.76, indicating the presence of considerable range of genetic diversity at molecular level. The dendrogram prepared on the basis of unweighted pair-group method of arithmetic average algorithm grouped the 16 wheat varieties into three major clusters. The clustering was strongly supported by high bootstrap values. The distribution of the varieties in different clusters and subclusters appeared to be related to their variability in NE parameter that was scored. Genetically diverse parents were identified that could potentially be used for their desirable characteristics in breeding programs for improvement of NE in wheat.  相似文献   

11.
Clustering algorithms divide a set of observations into groups so that members of the same group share common features. In most of the algorithms, tunable parameters are set arbitrarily or by trial and error, resulting in less than optimal clustering. This paper presents a global optimization strategy for the systematic and optimal selection of parameter values associated with a clustering method. In the process, a performance criterion for the optimization model is proposed and benchmarked against popular performance criteria from the literature (namely, the Silhouette coefficient, Dunn's index, and Davies-Bouldin index). The tuning strategy is illustrated using the support vector clustering (SVC) algorithm and simulated annealing. In order to reduce the computational burden, the paper also proposes an alternative to the adjacency matrix method (used for the assignment of cluster labels), namely the contour plotting approach. Datasets tested include the iris and the thyroid datasets from the UCI repository, as well as lymphoma and breast cancer data. The optimal tuning parameters are determined efficiently, while the contour plotting approach leads to significant reductions in computational effort (CPU time) especially for large datasets. The performance criteria comparisons indicate mixed results. Specifically, the Silhouette coefficient and the Davies-Bouldin index perform better, while the Dunn's index is worse on average than the proposed performance index.  相似文献   

12.
浙江省境内七子花天然种群遗传多样性研究   总被引:11,自引:3,他引:11  
利用RAPD技术对浙江省境内的七子花9个天然种群遗传多样性和遗传分化进行研究.结果表明,12种随机引物对180棵植物进行检测,共得到164个可重复的位点.多态位点百分率在14.60%~27.44%(平均为20.73%),以括苍山种群最高,其次是四明山种群,最低是观音坪种群.Shannon指数和Nei指数均反映出七子花各种群具有较低的遗传多样性,但遗传分化明显.Shannon指数显示种群内遗传多样性只占总遗传多样性的27.28%,而种群间遗传多样性却占72.72%;Nei指数表明种群内的遗传变异较小,种群间的遗传变异较大,种群间的遗传分化系数为0.7157.七子花种群间的基因流为0.1987,遗传相似度平均为0.7306,遗传距离平均为0.3150,各种群间的遗传分化明显.根据遗传距离聚类分析,大致可以将9个七子花种群分为东部和西部两大类群.  相似文献   

13.
MOTIVATION: Gene expression profiles should be useful in distinguishing variations in disease, since they reflect accurately the status of cells. The primary clustering of gene expression reveals the genotypes that are responsible for the proximity of members within each cluster, while further clustering elucidates the pathological features of the individual members of each cluster. However, since the first clustering process and the second classification step, in which the features are associated with clusters, are performed independently, the initial set of clusters may omit genes that are associated with pathologically meaningful features. Therefore, it is important to devise a way of identifying gene expression clusters that are associated with pathological features. RESULTS: We present the novel technique of 'itemset constrained clustering' (IC-Clustering), which computes the optimal cluster that maximizes the interclass variance of gene expression between groups, which are divided according to the restriction that only divisions that can be expressed using common features are allowed. This constraint automatically labels each cluster with a set of pathological features which characterize that cluster. When applied to liver cancer datasets, IC-Clustering revealed informative gene expression clusters, which could be annotated with various pathological features, such as 'tumor' and 'man', or 'except tumor' and 'normal liver function'. In contrast, the k-means method overlooked these clusters.  相似文献   

14.
Bayesian model–based clustering provides a powerful and flexible tool that can be incorporated into regression models to better understand the grouping of observations. Using data from the Seychelles Child Development Study, we explore the effect of prenatal methylmercury exposure on 20 neurodevelopmental outcomes measured in 9-year-old children. Rather than cluster individual subjects, we cluster the outcomes within a multiple outcomes model. By using information in the data to nest the outcomes into groups called domains, the model more accurately reflects the shared characteristics of neurodevelopmental domains and improves estimation of the overall and outcome-specific exposure effects by shrinking effects within and between domains selected by the data. The Bayesian paradigm allows for sampling from the posterior distribution of the grouping parameters; thus, inference can be made about group membership and their defining characteristics. We avoid the often difficult and highly subjective requirement of a priori identification of the total number of groups by incorporating a Dirichlet process prior to form a fully Bayesian multiple outcomes model.  相似文献   

15.
丰富的遗传多样性可为大豆育种提供宽阔的遗传基础,本研究基于35对SSR标记,对60份东北地区大豆疫霉根腐病抗性品种进行了遗传多样性分析,共检测到189个等位基因,平均每个位点等位变异数5.4个,多态性信息含量指数(PIC)为0.1550~0.8195,平均为0.6636;遗传相似系数的变异范围为0.31~0.74。利用5对高多态性SSR引物构建了60份抗性材料的指纹图谱,这5对SSR引物构建的指纹图谱可以将60份疫霉根腐病抗性材料逐一区分开。采用NTSYS2.10基于遗传距离的聚类分析,将60份抗性材料分为7个类群,其中78.33%的抗性品种(系)的遗传相似系数在0.45~0.74间,表明遗传差异相对较窄,品种间遗传多样性水平较低。聚类分析与群体遗传结构分析结果有部分重合,均反映出不同地区的抗性材料间存在一定的渗透和交流。  相似文献   

16.
Increase in food production viz-a-viz quality of food is important to feed the growing human population to attain food as well as nutritional security. The availability of diverse germplasm of any crop is an important genetic resource to mine the genes that may assist in attaining food as well as nutritional security. Here we used 15 RAPD and 23 SSR markers to elucidate diversity among 51 common bean genotypes mostly landraces collected from the Himalayan region of Jammu and Kashmir, India. We observed that both the markers are highly polymorphic. The discriminatory power of these markers was determined using various parameters like; percent polymorphism, PIC, resolving power and marker index. 15 RAPDs produced 171 polymorphic bands, while 23 SSRs produced 268 polymorphic bands. SSRs showed a higher PIC value (0.300) compared to RAPDs (0.243). Further the resolving power of SSRs was 5.241 compared to 3.86 for RAPDs. However, RAPDs showed a higher marker index (2.69) compared to SSRs (1.279) that may be attributed to their higher multiplex ratio. The dendrograms generated with hierarchical UPGMA cluster analysis grouped genotypes into two main clusters with various degrees of sub clustering within the cluster. Here we observed that both the marker systems showed comparable accuracy in grouping genotypes of common bean according to their area of cultivation. The model based STRUCTURE analysis using 15 RAPD and 23 SSR markers identified a population with 3 sub-populations which corresponds to distance based groupings. High level of genetic diversity was observed within the population. These findings have further implications in common bean breeding as well as conservation programs.  相似文献   

17.
It has only recently been possible to detect sufficient genetic diversity among anthrax isolates to allow genotype grouping (Keim et al. 1997). Early results of such grouping suggest that the southern African subcontinent may be the geographical origin of Bacillus anthracis. This report describes a pilot investigation of the genetic diversity of a study group of isolates from the Kruger National Park, South Africa, and efforts to detect spatio-temporal clustering within the study group. This study has also served as further validation for the newly developed Multi-Locus VNTR Analysis (MLVA), designed to simplify genotyping of B. anthracis isolates. The results reveal a diverse range of genotypes within the park allied with three genotype reference groups, and show that the MLVA procedure is robust for rapid analysis of B. anthracis genotypes. We also observed multiple genotype groups within epidemics and between geographically and temporally close epidemic episodes. This is in contrast to earlier characterizations of anthrax epidemics. The result of a Mantel test for time-space clustering indicates clustering of the anthrax isolates selected for the study.  相似文献   

18.
Intersimple sequence repeat (ISSR) amplification was used to analyze genetic relationships among silkworm, Bombyx mori L., strains. Nineteen primers containing simple sequence repeat (SSR) motifs were tested for amplification on a panel of 42 strains, representative of the diversity of silkworm germplasm; 12 of the primers amplified distinct, reproducible bands. The primers amplified a total of 108 bands, of which 85 (78.7%) were polymorphic. The ISSR results suggested that within the dinucleotide class, the poly(CA) motif was more common than the poly(CT) motif. The ISSR amplification pattern was used to group the silkworm strains into seven subclusters based on their origin in an unweighted pair-group method with arithmetic average cluster analysis by using Nei's genetic distance. Seven major ecotypic silkworm groups were analyzed. Principal component analysis of the ISSR data supported the unweighted pair-group method with arithmetic average clustering. Therefore, ISSR amplification is a valuable method for determining genetic variability among silkworm varieties. This efficient genetic fingerprinting technique should be useful for characterizing the large numbers of silkworm strains held in national and international germplasm centers.  相似文献   

19.
The large variety of clustering algorithms and their variants can be daunting to researchers wishing to explore patterns within their microarray datasets. Furthermore, each clustering method has distinct biases in finding patterns within the data, and clusterings may not be reproducible across different algorithms. A consensus approach utilizing multiple algorithms can show where the various methods agree and expose robust patterns within the data. In this paper, we present a software package - Consense, written for R/Bioconductor - that utilizes such an approach to explore microarray datasets. Consense produces clustering results for each of the clustering methods and produces a report of metrics comparing the individual clusterings. A feature of Consense is identification of genes that cluster consistently with an index gene across methods. Utilizing simulated microarray data, sensitivity of the metrics to the biases of the different clustering algorithms is explored. The framework is easily extensible, allowing this tool to be used by other functional genomic data types, as well as other high-throughput OMICS data types generated from metabolomic and proteomic experiments. It also provides a flexible environment to benchmark new clustering algorithms. Consense is currently available as an installable R/Bioconductor package (http://www.ohsucancer.com/isrdev/consense/).  相似文献   

20.
We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly to the same cluster. The clustering success rate, measuring the fraction of individuals that were properly inferred to belong to their correct breeds, was consistently approximately 98%. When markers of highest expected heterozygosity were used, genotypes that included at least 8-10 highly variable markers from among the 27 markers genotyped also achieved >95% clustering success. When 12-15 highly variable markers and only 15-20 of the 30 individuals per breed were used, clustering success was at least 90%. We suggest that in species for which population structure is of interest, databases of multilocus genotypes at highly variable markers should be compiled. These genotypes could then be used as training samples for genetic cluster analysis and to facilitate assignments of individuals of unknown origin to populations. The clustering algorithm has potential applications in defining the within-species genetic units that are useful in problems of conservation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号