共查询到20条相似文献,搜索用时 140 毫秒
1.
Maenhout S De Baets B Haesaert G Van Bockstaele E 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2007,115(7):1003-1013
Accurate prediction of the phenotypical performance of untested single-cross hybrids allows for a faster genetic progress
of the breeding pool at a reduced cost. We propose a prediction method based on ɛ-insensitive support vector machine regression (ɛ-SVR). A brief overview of the theoretical background of this fairly new technique and the use of specific kernel functions
based on commonly applied genetic similarity measures for dominant and co-dominant markers are presented. These different
marker types can be integrated into a single regression model by means of simple kernel operations. Field trial data from
the grain maize breeding programme of the private company RAGT R2n are used to assess the predictive capabilities of the proposed
methodology. Prediction accuracies are compared to those of one of today’s best performing prediction methods based on best
linear unbiased prediction. Results on our data indicate that both methods match each other’s prediction accuracies for several
combinations of marker types and traits. The ɛ-SVR framework, however, allows for a greater flexibility in combining different kinds of predictor variables. 相似文献
2.
We examined the usefulness of the best linear unbiased prediction associated with molecular markers for prediction of untested maize double-cross hybrids. Ten single-cross hybrids from different commercial backgrounds were crossed using a complete diallel design. These 10 single-cross hybrids were genotyped with 20 microsatellite markers. The best linear unbiased prediction associated with microsatellite information gave relatively good prediction ability of the double-cross hybrid performance, with correlations between observed phenotypic values and genotypic prediction values varying from 0.27 to 0.54. Taking into account the predictions of specific combing ability, the correlation between observed and predicted specific combining ability varied from 0.50 to 0.88. Based on these results, we infer that it is feasible to predict maize double-cross hybrids with different unbalance degrees without including any prior information about parental inbreed lines or single-cross hybrid performance. 相似文献
3.
Genomic prediction of hybrid performance in maize with models incorporating dominance and population specific marker effects 总被引:1,自引:0,他引:1
Technow F Riedelsheimer C Schrag TA Melchinger AE 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2012,125(6):1181-1194
Identifying high performing hybrids is an essential part of every maize breeding program. Genomic prediction of maize hybrid performance allows to identify promising hybrids, when they themselves or other hybrids produced from their parents were not tested in field trials. Using simulations, we investigated the effects of marker density (10, 1, 0.3 marker per mega base pair, Mbp(-1)), convergent or divergent parental populations, number of parents tested in other combinations (2, 1, 0), genetic model (including population-specific and/or dominance marker effects or not), and estimation method (GBLUP or BayesB) on the prediction accuracy. We based our simulations on marker genotypes of Central European flint and dent inbred lines from an ongoing maize breeding program. To simulate convergent or divergent parent populations, we generated phenotypes by assigning QTL to markers with similar or very different allele frequencies in both pools, respectively. Prediction accuracies increased with marker density and number of parents tested and were higher under divergent compared with convergent parental populations. Modeling marker effects as population-specific slightly improved prediction accuracy under lower marker densities (1 and 0.3?Mbp(-1)). This indicated that modeling marker effects as population-specific will be most beneficial under low linkage disequilibrium. Incorporating dominance effects improved prediction accuracies considerably for convergent parent populations, where dominance results in major contributions of SCA effects to the genetic variance among inter-population hybrids. While the general trends regarding the effects of the aforementioned influence factors on prediction accuracy were similar for GBLUP and BayesB, the latter method produced significantly higher accuracies for models incorporating dominance. 相似文献
4.
Frank Technow Tobias A. Schrag Wolfgang Schipprack Eva Bauer Henner Simianer Albrecht E. Melchinger 《Genetics》2014,197(4):1343-1355
Maize (Zea mays L.) serves as model plant for heterosis research and is the crop where hybrid breeding was pioneered. We analyzed genomic and phenotypic data of 1254 hybrids of a typical maize hybrid breeding program based on the important Dent × Flint heterotic pattern. Our main objectives were to investigate genome properties of the parental lines (e.g., allele frequencies, linkage disequilibrium, and phases) and examine the prospects of genomic prediction of hybrid performance. We found high consistency of linkage phases and large differences in allele frequencies between the Dent and Flint heterotic groups in pericentromeric regions. These results can be explained by the Hill–Robertson effect and support the hypothesis of differential fixation of alleles due to pseudo-overdominance in these regions. In pericentromeric regions we also found indications for consistent marker–QTL linkage between heterotic groups. With prediction methods GBLUP and BayesB, the cross-validation prediction accuracy ranged from 0.75 to 0.92 for grain yield and from 0.59 to 0.95 for grain moisture. The prediction accuracy of untested hybrids was highest, if both parents were parents of other hybrids in the training set, and lowest, if none of them were involved in any training set hybrid. Optimizing the composition of the training set in terms of number of lines and hybrids per line could further increase prediction accuracy. We conclude that genomic prediction facilitates a paradigm shift in hybrid breeding by focusing on the performance of experimental hybrids rather than the performance of parental lines in testcrosses. 相似文献
5.
Tobias A. Schrag Jens Möhring Albrecht E. Melchinger Barbara Kusterer Baldev S. Dhillon Hans-Peter Piepho Matthias Frisch 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2010,120(2):451-461
The identification of superior hybrids is important for the success of a hybrid breeding program. However, field evaluation
of all possible crosses among inbred lines requires extremely large resources. Therefore, efforts have been made to predict
hybrid performance (HP) by using field data of related genotypes and molecular markers. In the present study, the main objective
was to assess the usefulness of pedigree information in combination with the covariance between general combining ability
(GCA) and per se performance of parental lines for HP prediction. In addition, we compared the prediction efficiency of AFLP
and SSR marker data, estimated marker effects separately for reciprocal allelic configurations (among heterotic groups) of
heterozygous marker loci in hybrids, and imputed missing AFLP marker data for marker-based HP prediction. Unbalanced field
data of 400 maize dent × flint hybrids from 9 factorials and of 79 inbred parents were subjected to joint analyses with mixed
linear models. The inbreds were genotyped with 910 AFLP and 256 SSR markers. Efficiency of prediction (R
2) was estimated by cross-validation for hybrids having no or one parent evaluated in testcrosses. Best linear unbiased prediction
of GCA and specific combining ability resulted in the highest efficiencies for HP prediction for both traits (R
2 = 0.6–0.9), if pedigree and line per se data were used. However, without such data, HP for grain yield was more efficiently
predicted using molecular markers. The additional modifications of the marker-based approaches had no clear effect. Our study
showed the high potential of joint analyses of hybrids and parental inbred lines for the prediction of performance of untested
hybrids. 相似文献
6.
Genome-based prediction of testcross values in maize 总被引:1,自引:0,他引:1
Albrecht T Wimmer V Auinger HJ Erbe M Knaak C Ouzunova M Simianer H Schön CC 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2011,123(2):339-350
This is the first large-scale experimental study on genome-based prediction of testcross values in an advanced cycle breeding
population of maize. The study comprised testcross progenies of 1,380 doubled haploid lines of maize derived from 36 crosses
and phenotyped for grain yield and grain dry matter content in seven locations. The lines were genotyped with 1,152 single
nucleotide polymorphism markers. Pedigree data were available for three generations. We used best linear unbiased prediction
and stratified cross-validation to evaluate the performance of prediction models differing in the modeling of relatedness
between inbred lines and in the calculation of genome-based coefficients of similarity. The choice of similarity coefficient
did not affect prediction accuracies. Models including genomic information yielded significantly higher prediction accuracies
than the model based on pedigree information alone. Average prediction accuracies based on genomic data were high even for
a complex trait like grain yield (0.72–0.74) when the cross-validation scheme allowed for a high degree of relatedness between
the estimation and the test set. When predictions were performed across distantly related families, prediction accuracies
decreased significantly (0.47–0.48). Prediction accuracies decreased with decreasing sample size but were still high when
the population size was halved (0.67–0.69). The results from this study are encouraging with respect to genome-based prediction
of the genetic value of untested lines in advanced cycle breeding populations and the implementation of genomic selection
in the breeding process. 相似文献
7.
8.
Rocío Acosta-Pech José Crossa Gustavo de los Campos Simon Teyssèdre Bruno Claustres Sergio Pérez-Elizalde Paulino Pérez-Rodríguez 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2017,130(7):1431-1440
Key message
A new genomic model that incorporates genotype?×?environment interaction gave increased prediction accuracy of untested hybrid response for traits such as percent starch content, percent dry matter content and silage yield of maize hybrids.Abstract
The prediction of hybrid performance (HP) is very important in agricultural breeding programs. In plant breeding, multi-environment trials play an important role in the selection of important traits, such as stability across environments, grain yield and pest resistance. Environmental conditions modulate gene expression causing genotype?×?environment interaction (G?×?E), such that the estimated genetic correlations of the performance of individual lines across environments summarize the joint action of genes and environmental conditions. This article proposes a genomic statistical model that incorporates G?×?E for general and specific combining ability for predicting the performance of hybrids in environments. The proposed model can also be applied to any other hybrid species with distinct parental pools. In this study, we evaluated the predictive ability of two HP prediction models using a cross-validation approach applied in extensive maize hybrid data, comprising 2724 hybrids derived from 507 dent lines and 24 flint lines, which were evaluated for three traits in 58 environments over 12 years; analyses were performed for each year. On average, genomic models that include the interaction of general and specific combining ability with environments have greater predictive ability than genomic models without interaction with environments (ranging from 12 to 22%, depending on the trait). We concluded that including G?×?E in the prediction of untested maize hybrids increases the accuracy of genomic models.9.
Tobias A. Schrag Jens Möhring Hans Peter Maurer Baldev S. Dhillon Albrecht E. Melchinger Hans-Peter Piepho Anker P. Sørensen Matthias Frisch 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2009,118(4):741-751
In hybrid breeding, the prediction of hybrid performance (HP) is extremely important as it is difficult to evaluate inbred
lines in numerous cross combinations. Recent developments such as doubled haploid production and molecular marker technologies
have enhanced the prospects of marker-based HP prediction to accelerate the breeding process. Our objectives were to (1) predict
HP using a combined analysis of hybrids and parental lines from a breeding program, (2) evaluate the use of molecular markers
in addition to phenotypic and pedigree data, (3) evaluate the combination of line per se data with marker-based estimates,
(4) study the effect of the number of tested parents, and (5) assess the advantage of haplotype blocks. An unbalanced dataset
of 400 hybrids from 9 factorial crosses tested in different experiments and data of 79 inbred parents were subjected to combined
analyses with a mixed linear model. Marker data of the inbreds were obtained with 20 AFLP primer–enzyme combinations. Cross-validation
was used to assess the performance prediction of hybrids of which no or only one parental line was testcross evaluated. For
HP prediction, the highest proportion of explained variance (R
2), 46% for grain yield (GY) and 70% for grain dry matter content (GDMC), was obtained from line per se best linear unbiased
prediction (BLUP) estimates plus marker effects associated with mid-parent heterosis (TEAM-LM). Our study demonstrated that
HP was efficiently predicted using molecular markers even for GY when testcross data of both parents are not available. This
can help in improving greatly the efficiency of commercial hybrid breeding programs. 相似文献
10.
Performance prediction of F1 hybrids between recombinant inbred lines derived from two elite maize inbred lines 总被引:1,自引:0,他引:1
Tingting Guo Huihui Li Jianbing Yan Jihua Tang Jiansheng Li Zhiwu Zhang Luyan Zhang Jiankang Wang 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2013,126(1):189-201
Selection of recombinant inbred lines (RILs) from elite hybrids is a key method in maize breeding especially in developing countries. The RILs are normally derived by repeated self-pollination and selection. In this study, we first investigated the accuracy of different models in predicting the performance of F1 hybrids between RILs derived from two elite maize inbred lines Zong3 and 87-1, and then compared these models through simulation using a wider range of genetic models. Results indicated that appropriate prediction models depended on genetic architecture, e.g., combined model using breeding value and genome-wide prediction (BV+GWP) has the highest prediction accuracy for high V D/V A ratio (>0.5) traits. Theoretical studies demonstrated that different components of genetic variance were captured by different prediction models, which in turn explained the accuracy of these models in predicting the F1 hybrid performance. Based on genome-wide prediction model (GWP), 114 untested F1 hybrids possibly having higher grain yield than the original F1 hybrid Yuyu22 (the single cross between Zong3 and 87-1) have been identified and recommended for further field test. 相似文献
11.
Efficient genomic selection in animals or crops requires the accurate prediction of the agronomic performance of individuals from their high-density molecular marker profiles. Using a training data set that contains the genotypic and phenotypic information of a large number of individuals, each marker or marker allele is associated with an estimated effect on the trait under study. These estimated marker effects are subsequently used for making predictions on individuals for which no phenotypic records are available. As most plant and animal breeding programs are currently still phenotype driven, the continuously expanding collection of phenotypic records can only be used to construct a genomic prediction model if a dense molecular marker fingerprint is available for each phenotyped individual. However, as the genotyping budget is generally limited, the genomic prediction model can only be constructed using a subset of the tested individuals and possibly a genome-covering subset of the molecular markers. In this article, we demonstrate how an optimal selection of individuals can be made with respect to the quality of their available phenotypic data. We also demonstrate how the total number of molecular markers can be reduced while a maximum genome coverage is ensured. The third selection problem we tackle is specific to the construction of a genomic prediction model for a hybrid breeding program where only molecular marker fingerprints of the homozygous parents are available. We show how to identify the set of parental inbred lines of a predefined size that has produced the highest number of progeny. These three selection approaches are put into practice in a simulation study where we demonstrate how the trade-off between sample size and sample quality affects the prediction accuracy of genomic prediction models for hybrid maize.DESPITE the numerous studies devoted to molecular marker-based breeding, the genetic progress of most complex traits in today''s plant and animal breeding programs still heavily relies on phenotypic selection. Most breeding companies have established dedicated databases that store the vast number of phenotypic records that are being routinely collected throughout the course of their breeding programs. These phenotypic records are, however, gradually being complemented by various types of molecular marker scores and it is to be expected that effective marker-based selection schemes will eventually allow current phenotyping efforts to be reduced (Bernardo 2008; Hayes et al. 2009). The available marker and phenotypic databases already allow for the construction and validation of marker-based selection schemes. Mining the phenotypic databases of a breeding company is, however, quite different from analyzing the data that is generated by a carefully designed experiment. Genetic evaluation data is often severely unbalanced as elite individuals are usually tested many times on their way to becoming a commercial variety or sire, while less performing individuals are often disregarded after a single trial. Furthermore, the different phenotypic evaluation trials are separated in time and space and as such, subjected to different environmental conditions. Therefore, ranking the performance of individuals that were evaluated in different phenotypic trials is usually a nontrivial task.Animal breeders are well experienced when it comes to handling unbalanced genetic evaluation data. The best linear unbiased predictor or BLUP approach (Henderson 1975) presented a major breakthrough in this respect, especially when combined with restricted maximum-likelihood or REML estimation of the needed variance components (Patterson and Thompson 1971). Somewhat later on, this linear mixed modeling approach was also adopted by plant breeders as the de facto standard for handling unbalanced phenotypic data. The more recent developments in genomic selection (Bernardo 1995; Meuwissen et al. 2001; Gianola and van Kaam 2008) and marker-trait association studies (Yu et al. 2006) are, at least partially, BLUP-based and are therefore, in theory, perfectly suited for mining the large marker and phenotypic databases that back each breeding program. In practice, however, the unbalancedness of the available genetic evaluation data often reduces its total information content and the construction of a marker-based selection model is limited to a more balanced subset of the data.As phenotypic data are available, genotyping costs limit the total number of individuals that can be included in the construction of a genomic prediction model. The best results will be obtained by selecting a subset of individuals for which the phenotypic evaluation data exhibits the least amount of unbalancedness. In this article we demonstrate how this phenotypic subset selection problem can be translated into a standard graph theory problem that can be solved with exact algorithms or less-time-consuming heuristics.In most plant and animal species, the number of available molecular markers is rapidly increasing, while the genotyping cost per marker is decreasing. Nevertheless, as budgets are always limited, genotyping all mapped markers for a small number of individuals might be less efficient than genotyping a restricted set of well-chosen markers on a wider set of individuals. One should therefore be able to select a subset of molecular markers that covers the entire genome as uniformly as possible. We demonstrate how this marker selection problem can also be translated into a well-known graph theory problem that has an exact solution.The third problem we tackle by means of graph theory is more specific to hybrid breeding programs where the parental individuals are nearly or completely homozygous. This implies that we can deduce the molecular marker fingerprint of a hybrid individual from the marker scores of its parents. As the phenotypic data are collected on the hybrids, genotyping costs can be reduced by selecting a subset of parental inbreds that have produced the maximum number of genetically distinct offspring among themselves. Obviously, the phenotypic data on these offspring should be as balanced as possible.Besides solving the above-mentioned selection problems by means of graph theory algorithms, we demonstrate their use in a simulation study that allows us to determine the optimum trade-off between the number of individuals and the size of the genotyped molecular marker fingerprint for predicting the phenotypic performance of hybrid maize by means of ɛ-insensitive support vector machine regression (ɛ-SVR) (Maenhout et al. 2007, 2008, 2010) and best linear prediction (BLP) (Bernardo 1994, 1995, 1996). 相似文献
12.
13.
Hybrid breeding of rice via genomic selection 总被引:1,自引:0,他引:1
Yanru Cui Ruidong Li Guangwei Li Fan Zhang Tiantian Zhu Qifa Zhang Jauhar Ali Zhikang Li Shizhong Xu 《Plant biotechnology journal》2020,18(1):57-67
Hybrid breeding is the main strategy for improving productivity in many crops, especially in rice and maize. Genomic hybrid breeding is a technology that uses whole‐genome markers to predict future hybrids. Predicted superior hybrids are then field evaluated and released as new hybrid cultivars after their superior performances are confirmed. This will increase the opportunity of selecting true superior hybrids with minimum costs. Here, we used genomic best linear unbiased prediction to perform hybrid performance prediction using an existing rice population of 1495 hybrids. Replicated 10‐fold cross‐validations showed that the prediction abilities on ten agronomic traits ranged from 0.35 to 0.92. Using the 1495 rice hybrids as a training sample, we predicted six agronomic traits of 100 hybrids derived from half diallel crosses involving 21 parents that are different from the parents of the hybrids in the training sample. The prediction abilities were relatively high, varying from 0.54 (yield) to 0.92 (grain length). We concluded that the current population of 1495 hybrids can be used to predict hybrids from seemingly unrelated parents. Eventually, we used this training population to predict all potential hybrids of cytoplasm male sterile lines from 3000 rice varieties from the 3K Rice Genome Project. Using a breeding index combining 10 traits, we identified the top and bottom 200 predicted hybrids. SNP genotypes of the training population and parameters estimated from this training population are available for general uses and further validation in genomic hybrid prediction of all potential hybrids generated from all varieties of rice. 相似文献
14.
Schrag TA Maurer HP Melchinger AE Piepho HP Peleman J Frisch M 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2007,114(8):1345-1355
Marker-based prediction of hybrid performance facilitates the identification of untested single-cross hybrids with superior
yield performance. Our objectives were to (1) determine the haplotype block structure of experimental germplasm from a hybrid
maize breeding program, (2) develop models for hybrid performance prediction based on haplotype blocks, and (3) compare hybrid
performance prediction based on haplotype blocks with other approaches, based on single AFLP markers or general combining
ability (GCA), under a validation scenario relevant for practical breeding. In total, 270 hybrids were evaluated for grain
yield in four Dent × Flint factorial mating experiments. Their parental inbred lines were genotyped with 20 AFLP primer–enzyme
combinations. Adjacent marker loci were combined into haplotype blocks. Hybrid performance was predicted on basis of single
marker loci and haplotype blocks. Prediction based on variable haplotype block length resulted in an improved prediction of
hybrid performance compared with the use of single AFLP markers. Estimates of prediction efficiency (R
2
) ranged from 0.305 to 0.889 for marker-based prediction and from 0.465 to 0.898 for GCA-based prediction. For inter-group
hybrids with predominance of general over specific combining ability, the hybrid prediction from GCA effects was efficient
in identifying promising hybrids. Considering the advantage of haplotype block approaches over single marker approaches for
the prediction of inter-group hybrids, we see a high potential to substantially improve the efficiency of hybrid breeding
programs.
Tobias A. Schrag and Hans Peter Maurer contributed equally to this work. 相似文献
15.
Impact of selective genotyping in the training population on accuracy and bias of genomic selection 总被引:1,自引:0,他引:1
Zhao Y Gowda M Longin FH Würschum T Ranc N Reif JC 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2012,125(4):707-713
Estimating marker effects based on routinely generated phenotypic data of breeding programs is a cost-effective strategy to implement genomic selection. Truncation selection in breeding populations, however, could have a strong impact on the accuracy to predict genomic breeding values. The main objective of our study was to investigate the influence of phenotypic selection on the accuracy and bias of genomic selection. We used experimental data of 788 testcross progenies from an elite maize breeding program. The testcross progenies were evaluated in unreplicated field trials in ten environments and fingerprinted with 857 SNP markers. Random regression best linear unbiased prediction method was used in combination with fivefold cross-validation based on genotypic sampling. We observed a substantial loss in the accuracy to predict genomic breeding values in unidirectional selected populations. In contrast, estimating marker effects based on bidirectional selected populations led to only a marginal decrease in the prediction accuracy of genomic breeding values. We concluded that bidirectional selection is a valuable approach to efficiently implement genomic selection in applied plant breeding programs. 相似文献
16.
基于已知的人类PolII启动子序列数据,综合选取启动子序列内容和序列信号特征,构建启动子的支持向量机分类器.分别以启动子序列的6-mer频数作为离散源参数构建序列内容特征。同时选取24个位点的3-mer频数作为序列信号特征构建PWM,将所得到的两类参数输入支持向量机对人类启动子进行预测.用10折叠交叉检验和独立数据集来衡量算法的预测能力,相关系数指标达到95%以上,结果显示结合了支持向量机的离散增量算法能够有效的提高预测成功率,是进行真核生物启动子预测的一种很有效的方法. 相似文献
17.
Prediction of single-cross hybrid performance for grain yield and grain dry matter content in maize using AFLP markers associated with QTL 总被引:1,自引:0,他引:1
Schrag TA Melchinger AE Sørensen AP Frisch M 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2006,113(6):1037-1047
Prediction methods to identify single-cross hybrids with superior yield performance have the potential to greatly improve the efficiency of commercial maize (Zea mays L.) hybrid breeding programs. Our objectives were to (1) identify marker loci associated with quantitative trait loci for hybrid performance or specific combining ability (SCA) in maize, (2) compare hybrid performance prediction by genotypic value estimates with that based on general combining ability (GCA) estimates, and (3) investigate a newly proposed combination of the GCA model with SCA predictions from genotypic value estimates. A total of 270 hybrids was evaluated for grain yield and grain dry matter content in four Dent × Flint factorial mating experiments, their parental inbred lines were genotyped with 20 AFLP primer-enzyme combinations. Markers associated significantly with hybrid performance and SCA were identified, genotypic values and SCA effects were estimated, and four hybrid performance prediction approaches were evaluated. For grain yield, between 38 and 98 significant markers were identified for hybrid performance and between zero and five for SCA. Estimates of prediction efficiency (R
2) ranged from 0.46 to 0.86 for grain yield and from 0.59 to 0.96 for grain dry matter content. Models enhancing the GCA approach with SCA estimates resulted in the highest prediction efficiency if the SCA to GCA ratio was high. We conclude that it is advantageous for prediction of single-cross hybrids to enhance a GCA-based model with SCA effects estimated from molecular marker data, if SCA variances are of similar or larger importance as GCA variances. 相似文献
18.
In this paper, we investigate the design of accurate predictors for DNA-binding sites in proteins from amino acid sequences. As a result, we propose a hybrid method using support vector machine (SVM) in conjunction with evolutionary information of amino acid sequences in terms of their position-specific scoring matrices (PSSMs) for prediction of DNA-binding sites. Considering the numbers of binding and non-binding residues in proteins are significantly unequal, two additional weights as well as SVM parameters are analyzed and adopted to maximize net prediction (NP, an average of sensitivity and specificity) accuracy. To evaluate the generalization ability of the proposed method SVM-PSSM, a DNA-binding dataset PDC-59 consisting of 59 protein chains with low sequence identity on each other is additionally established. The SVM-based method using the same six-fold cross-validation procedure and PSSM features has NP=80.15% for the training dataset PDNA-62 and NP=69.54% for the test dataset PDC-59, which are much better than the existing neural network-based method by increasing the NP values for training and test accuracies up to 13.45% and 16.53%, respectively. Simulation results reveal that SVM-PSSM performs well in predicting DNA-binding sites of novel proteins from amino acid sequences. 相似文献
19.
Protein N-glycosylation plays an important role in protein function. Yet, at present, few computational methods are available for the prediction of this protein modification. This prompted our development of a support vector machine (SVM)-based method for this task, as well as a partial least squares (PLS) regression based prediction method for comparison. A functional domain feature space was used to create SVM and PLS models, which achieved accuracies of 83.91% and 79.89%, respectively, as evaluated by a leave-one-out cross-validation. Subsequently, SVM and PLS models were developed based on functional domain and protein secretion information, which yielded accuracies of 89.13% and 86%, respectively. This analysis demonstrates that the protein functional domain and secretion information are both efficient predictors of N-glycosylation. 相似文献
20.
ABSTRACT: BACKGROUND: There is increasing empirical evidence that whole-genome prediction (WGP) is a powerful tool for predicting line and hybrid performance in maize. However, there is a lack of knowledge about the sensitivity of WGP models towards the genetic architecture of the trait. Whereas previous studies exclusively focused on highly polygenic traits, important agronomic traits such as disease resistances, nutrifunctional or climate adaptational traits have a genetic architecture which is either much less complex or unknown. For such cases, information about model robustness and guidelines for model selection are lacking. Here, we compared five WGP models with different assumptions about the distribution of the underlying genetic effects. As contrasting model traits, we chose three highly polygenic agronomic traits and three metabolites each with a major QTL explaining 22 to 30 % of the genetic variance in a panel of 289 diverse maize inbred lines genotyped with 56,110 SNPs. RESULTS: We found the five WGP models to be remarkable robust towards trait architecture with the largest differences in prediction accuracies ranging between 0.05 and 0.14 for the same trait, most likely as the result of the high level of linkage disequilibrium prevailing in elite maize germplasm. Whereas RR-BLUP performed best for the agronomic traits, it was inferior to LASSO or elastic net for the three metabolites. We found the approach of genome partitioning of genetic variance, first applied in human genetics, as useful in guiding the breeder which model to choose, if prior knowledge of the trait architecture is lacking. CONCLUSIONS: Our results suggest that in diverse germplasm of elite maize inbred lines with a high level of LD, WGP models differ only slightly in their accuracies, irrespective of the number and effects of QTL found in previous linkage or association mapping studies. However, small gains in prediction accuracies can be achieved if the WGP model is selected according to the genetic architecture of the trait. If the trait architecture is unknown e.g. for novel traits which only recently received attention in breeding, we suggest to inspect the distribution of the genetic variance explained by each chromosome for guiding model selection in WGP. 相似文献