期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The future is now: Amplicon sequencing and sequence capture usher in the conservation genomics era

Mariah H. Meek Wesley A. Larson 《Molecular ecology resources》2019,19(4):795-803

The genomics revolution has initiated a new era of population genetics where genome‐wide data are frequently used to understand complex patterns of population structure and selection. However, the application of genomic tools to inform management and conservation has been somewhat rare outside a few well studied species. Fortunately, two recently developed approaches, amplicon sequencing and sequence capture, have the potential to significantly advance the field of conservation genomics. Here, amplicon sequencing refers to highly multiplexed PCR followed by high‐throughput sequencing (e.g., GTseq), and sequence capture refers to using capture probes to isolate loci from reduced‐representation libraries (e.g., Rapture). Both approaches allow sequencing of thousands of individuals at relatively low costs, do not require any specialized equipment for library preparation, and generate data that can be analyzed without sophisticated computational infrastructure. Here, we discuss the advantages and disadvantages of each method and provide a decision framework for geneticists who are looking to integrate these methods into their research programme. While it will always be important to consider the specifics of the biological question and system, we believe that amplicon sequencing is best suited for projects aiming to genotype <500 loci on many individuals (>1,500) or for species where continued monitoring is anticipated (e.g., long‐term pedigrees). Sequence capture, on the other hand, is best applied to projects including fewer individuals or where >500 loci are required. Both of these techniques should smooth the transition from traditional genetic techniques to genomics, helping to usher in the conservation genomics era. 相似文献

2.

Minimum sample sizes for population genomics: an empirical study from an Amazonian plant species

下载免费PDF全文

Alison G. Nazareno Jordan B. Bemmels Christopher W. Dick Lúcia G. Lohmann 《Molecular ecology resources》2017,17(6):1136-1147

High‐throughput DNA sequencing facilitates the analysis of large portions of the genome in nonmodel organisms, ensuring high accuracy of population genetic parameters. However, empirical studies evaluating the appropriate sample size for these kinds of studies are still scarce. In this study, we use double‐digest restriction‐associated DNA sequencing (ddRADseq) to recover thousands of single nucleotide polymorphisms (SNPs) for two physically isolated populations of Amphirrhox longifolia (Violaceae), a nonmodel plant species for which no reference genome is available. We used resampling techniques to construct simulated populations with a random subset of individuals and SNPs to determine how many individuals and biallelic markers should be sampled for accurate estimates of intra‐ and interpopulation genetic diversity. We identified 3646 and 4900 polymorphic SNPs for the two populations of A. longifolia, respectively. Our simulations show that, overall, a sample size greater than eight individuals has little impact on estimates of genetic diversity within A. longifolia populations, when 1000 SNPs or higher are used. Our results also show that even at a very small sample size (i.e. two individuals), accurate estimates of F_ST can be obtained with a large number of SNPs (≥1500). These results highlight the potential of high‐throughput genomic sequencing approaches to address questions related to evolutionary biology in nonmodel organisms. Furthermore, our findings also provide insights into the optimization of sampling strategies in the era of population genomics. 相似文献

3.

Rapid and simple methodology for isolation of high quality genomic DNA from coniferous tissues (<Emphasis Type="Italic">Taxus baccata</Emphasis>)

Abolfazl Barzegari Sepideh Zununi Vahed Sina Atashpaz Sajjad Khani Yadollah Omidi 《Molecular biology reports》2010,37(2):833-837

Various investigations have been so far performed for extraction of genomic DNA from plant tissues, in which the extracted intact DNA can be exploited for a diverse range of biological studies. Extraction of high quality DNA from leathery plant tissues (e.g., coniferous organs) appears to be a critical stage. Moreover, for some species such as Taxus trees, bioprocess engineering and biosynthesis of secondary metabolites (e.g., paclitaxel) is a crucial step due to the restrictions associated with extinction of these species. However, extraction of intact genomic DNA from these plants still demands a rapid, easy and efficient protocol. To pursue such aim, in the current work, we report on the development of a simple and highly efficient method for the extraction of DNA from Taxus baccata. Based upon our protocol, interfering phenolic compounds were removed from extraction using polyvinylpyrrolidone and RNA contamination was resolved using LiCl. By employing this method, high quality genomic DNA was successfully extracted from leaves of T. baccata. The quality of extracted DNA was validated by various techniques such as RAPD marker, restriction digestions and pre-AFLP. Upon our findings, we propose this simple method to be considered for extraction of DNA from leathery plant tissues. 相似文献

4.

Comparing Pool‐seq,Rapture, and GBS genotyping for inferring weak population structure: The American lobster (Homarus americanus) as a case study

Yann Dorant Laura Benestan Quentin Rougemont Eric Normandeau Brian Boyle Rmy Rochette Louis Bernatchez 《Ecology and evolution》2019,9(11):6606-6623

Unraveling genetic population structure is challenging in species potentially characterized by large population size and high dispersal rates, often resulting in weak genetic differentiation. Genotyping a large number of samples can improve the detection of subtle genetic structure, but this may substantially increase sequencing cost and downstream bioinformatics computational time. To overcome this challenge, alternative, cost‐effective sequencing approaches, namely Pool‐seq and Rapture, have been developed. We empirically measured the power of resolution and congruence of these two methods in documenting weak population structure in nonmodel species with high gene flow comparatively to a conventional genotyping‐by‐sequencing (GBS) approach. For this, we used the American lobster (Homarus americanus) as a case study. First, we found that GBS, Rapture, and Pool‐seq approaches gave similar allele frequency estimates (i.e., correlation coefficient over 0.90) and all three revealed the same weak pattern of population structure. Yet, Pool‐seq data showed F_ST estimates three to five times higher than GBS and Rapture, while the latter two methods returned similar F_ST estimates, indicating that individual‐based approaches provided more congruent results than Pool‐seq. We conclude that despite higher costs, GBS and Rapture are more convenient approaches to use in the case of species exhibiting very weak differentiation. While both GBS and Rapture approaches provided similar results with regard to estimates of population genetic parameters, GBS remains more cost‐effective in project involving a relatively small numbers of genotyped individuals (e.g., <1,000). Overall, this study illustrates the complexity of estimating genetic differentiation and other summary statistics in complex biological systems characterized by large population size and migration rates. 相似文献

5.

Analysis of the genomic basis of functional diversity in dinoflagellates using a transcriptome‐based sequence similarity network

下载免费PDF全文

Arnaud Meng Erwan Corre Ian Probert Andres Gutierrez‐Rodriguez Raffaele Siano Anita Annamale Adriana Alberti Corinne Da Silva Patrick Wincker Stéphane Le Crom Fabrice Not Lucie Bittner 《Molecular ecology》2018,27(10):2365-2380

相似文献

6.

Genome‐wide variation in nucleotides and retrotransposons in alpine populations of Arabis alpina (Brassicaceae)

Aude Rogivue Rimjhim R. Choudhury Stefan Zoller Stphane Joost Franois Felber Michel Kasser Christian Parisod Felix Gugerli 《Molecular ecology resources》2019,19(3):773-787

Advances in high‐throughput sequencing have promoted the collection of reference genomes and genome‐wide diversity. However, the assessment of genomic variation among populations has hitherto mainly been surveyed through single‐nucleotide polymorphisms (SNPs) and largely ignored the often major fraction of genomes represented by transposable elements (TEs). Despite accumulating evidence supporting the evolutionary significance of TEs, comprehensive surveys remain scarce. Here, we sequenced the full genomes of 304 individuals of Arabis alpina sampled from four nearby natural populations to genotype SNPs as well as polymorphic long terminal repeat retrotransposons (polymorphic TEs; i.e., presence/absence of TE insertions at specific loci). We identified 291,396 SNPs and 20,548 polymorphic TEs, comparing their contributions to genomic diversity and divergence across populations. Few SNPs were shared among populations and overall showed high population‐specific variation, whereas most polymorphic TEs segregated among populations. The genomic context of these two classes of variants further highlighted candidate adaptive loci having a putative impact on functional genes. In particular, 4.96% of the SNPs were identified as nonsynonymous or affecting start/stop codons. In contrast, 43% of the polymorphic TEs were present next to Arabis genes enriched in functional categories related to the regulation of reproduction and responses to biotic as well as abiotic stresses. This unprecedented data set, mapping variation gained from SNPs and complementary polymorphic TEs within and among populations, will serve as a rich resource for addressing microevolutionary processes shaping genome variation. 相似文献

7.

Nucleosome Positioning,Nucleosome Spacing and the Nucleosome Code

David J. Clark 《Journal of biomolecular structure & dynamics》2013,31(6):781-793

Abstract

Nucleosome positioning has been the subject of intense study for many years. The properties of micrococcal nuclease, the enzyme central to these studies, are discussed. The various methods used to determine nucleosome positions in vitro and in vivo are reviewed critically. These include the traditional low resolution method of indirect end-labelling, high resolution methods such as primer extension, monomer extension and nucleosome sequencing, and the high throughput methods for genome-wide analysis (microarray hybridisation and parallel sequencing). It is established that low resolution mapping yields an averaged chromatin structure, whereas high resolution mapping reveals the weighted superposition of all the chromatin states in a cell population. Mapping studies suggest that yeast DNA contains information specifying the positions of nucleosomes and that this code is made use of by the cell. It is proposed that the positioning code facilitates nucleosome spacing by encoding information for multiple alternative overlapping nucleosomal arrays. Such a code might facilitate the shunting of nucleosomes from one array to another by ATP-dependent chromatin remodelling machines. 相似文献

8.

Long‐read sequence capture of the haemoglobin gene clusters across codfish species

Siv Nam Khang Hoff Helle T. Baalsrud Ave Tooming‐Klunderud Morten Skage Todd Richmond Gregor Obernosterer Reza Shirzadi Ole Kristian Trresen Kjetill S. Jakobsen Sissel Jentoft 《Molecular ecology resources》2019,19(1):245-259

Combining high‐throughput sequencing with targeted sequence capture has become an attractive tool to study specific genomic regions of interest. Most studies have so far focused on the exome using short‐read technology. These approaches are not designed to capture intergenic regions needed to reconstruct genomic organization, including regulatory regions and gene synteny. Here, we demonstrate the power of combining targeted sequence capture with long‐read sequencing technology for comparative genomic analyses of the haemoglobin (Hb) gene clusters across eight species separated by up to 70 million years. Guided by the reference genome assembly of the Atlantic cod (Gadus morhua) together with genome information from draft assemblies of selected codfishes, we designed probes covering the two Hb gene clusters. Use of custom‐made barcodes combined with PacBio RSII sequencing led to highly continuous assemblies of the LA (~100 kb) and MN (~200 kb) clusters, which include syntenic regions of coding and intergenic sequences. Our results revealed an overall conserved genomic organization of the Hb genes within this lineage, yet with several, lineage‐specific gene duplications. Moreover, for some of the species examined, we identified amino acid substitutions at two sites in the Hbb1 gene as well as length polymorphisms in its regulatory region, which has previously been linked to temperature adaptation in Atlantic cod populations. This study highlights the use of targeted long‐read capture as a versatile approach for comparative genomic studies by generation of a cross‐species genomic resource elucidating the evolutionary history of the Hb gene family across the highly divergent group of codfishes. 相似文献

9.

Four chromosome replication origins in the archaeon Pyrobaculum calidifontis

EA Pelve AC Lindås A Knöppel A Mira R Bernander 《Molecular microbiology》2012,85(5):986-995

Replication origins were mapped in hyperthermophilic crenarchaea, using high‐throughput sequencing‐based marker frequency analysis. We confirm previous origin mapping in Sulfolobus acidocaldarius, and demonstrate that the single chromosome of Pyrobaculum calidifontis contains four replication origins, the highest number detected in a prokaryotic organism. The relative positions of the origins in both organisms coincided with regions enriched in highly conserved (core) archaeal genes. We show that core gene distribution provides a useful tool for origin identification in archaea, and predict multiple replication origins in a range of species. One of the P. calidifontis origins was mapped in detail, and electrophoretic mobility shift assays demonstrated binding of the Cdc6/Orc1 replication initiator protein to a repeated sequence element, denoted Orb‐1, within the origin. The high‐throughput sequencing approach also allowed for an annotation update of both genomes, resulting in the restoration of open reading frames encoding proteins involved in, e.g., sugar, nitrate and energy metabolism, as well as in glycosylation and DNA repair. 相似文献

10.

Genome divergence and diversification within a geographic mosaic of coevolution

下载免费PDF全文

Thomas L. Parchman C. Alex Buerkle Víctor Soria‐Carrasco Craig W. Benkman 《Molecular ecology》2016,25(22):5705-5718

Despite substantial interest in coevolution's role in diversification, examples of coevolution contributing to speciation have been elusive. Here, we build upon past studies that have shown both coevolution between South Hills crossbills and lodgepole pine (Pinus contorta), and high levels of reproductive isolation between South Hills crossbills and other ecotypes in the North American red crossbill (Loxia curvirostra) complex. We used genotyping by sequencing to generate population genomic data and applied phylogenetic and population genetic analyses to characterize the genetic structure within and among nine of the ecotypes. Although genome‐wide divergence was slight between ecotypes (F_ST = 0.011–0.035), we found evidence of relative genetic differentiation (as measured by F_ST) between and genetic cohesiveness within many of them. As expected for nomadic and opportunistic breeders, we detected no evidence of isolation by distance. The one sedentary ecotype, the South Hills crossbill, was genetically most distinct because of elevated divergence at a small number of loci rather than pronounced overall genome‐wide divergence. These findings suggest that mechanisms related to recent local coevolution between South Hills crossbills and lodgepole pine (e.g. strong resource‐based density dependence limiting gene flow) have been associated with genome divergence in the face of gene flow. Our results further characterize a striking example of coevolution driving speciation within perhaps as little as 6000 years. 相似文献

11.

Lousy grouse: Comparing evolutionary patterns in Alaska galliform lice to understand host evolution and host–parasite interactions

Andrew D. Sweet Robert E. Wilson Sarah A. Sonsthagen Kevin P. Johnson 《Ecology and evolution》2020,10(15):8379-8393

Understanding both sides of host–parasite relationships can provide more complete insights into host and parasite biology in natural systems. For example, phylogenetic and population genetic comparisons between a group of hosts and their closely associated parasites can reveal patterns of host dispersal, interspecies interactions, and population structure that might not be evident from host data alone. These comparisons are also useful for understanding factors that drive host–parasite coevolutionary patterns (e.g., codivergence or host switching) over different periods of time. However, few studies have compared the evolutionary histories between multiple groups of parasites from the same group of hosts at a regional geographic scale. Here, we used genomic data to compare phylogenomic and population genomic patterns of Alaska ptarmigan and grouse species (Aves: Tetraoninae) and two genera of their associated feather lice: Lagopoecus and Goniodes. We used whole‐genome sequencing to obtain hundreds of genes and thousands of single‐nucleotide polymorphisms (SNPs) for the lice and double‐digest restriction‐associated DNA sequences to obtain SNPs from Alaska populations of two species of ptarmigan. We found that both genera of lice have some codivergence with their galliform hosts, but these relationships are primarily characterized by host switching and phylogenetic incongruence. Population structure was also uncorrelated between the hosts and lice. These patterns suggest that grouse, and ptarmigan in particular, share habitats and have likely had historical and ongoing dispersal within Alaska. However, the two genera of lice also have sufficient dissimilarities in the relationships with their hosts to suggest there are other factors, such as differences in louse dispersal ability, that shape the evolutionary patterns with their hosts. 相似文献

12.

A fair fight between molecular marker types in a seascape genetics setting

Laura E. Timm 《Molecular ecology》2020,29(12):2133-2136

From its inception, population genetics has been nearly as concerned with the genetic data type—to which analyses are brought to bear—as it is with the analysis methods themselves. The field has traversed allozymes, microsatellites, segregating sites in multilocus alignments and, currently, single nucleotide polymorphisms (SNPs) generated by high‐throughput genomic sequencing methods, primarily whole genome sequencing and reduced representation library (RRL) sequencing. As each emerging data type has gained traction, it has been compared to existing methods, based on its relative ability to discern population structural complexity at increasing levels of resolution. However, this is usually done by comparing the gold standard in one data type to the gold standard in the new data type. These gold standards frequently differ in power and in sampling density, both across a genome and throughout a spatial range. In this issue of Molecular Ecology, D’Aloia et al. apply the high‐throughput approach as fully as possible to microsatellites, nuclear loci and SNPs genotyped through an RRL method; this is coupled with a spatially dense sampling scheme. Completing a battery of population genetics analyses across data types (including a series of down‐sampled data sets), the authors find that SNP data are slightly more sensitive to fine‐scale genetic structure, and the results are more resilient to down‐sampling than microsatellites and nonrepetitive nuclear loci. However, their results are far from an unqualified victory for RRL SNP data over all previous data types: the authors note that modest additions to the microsatellites and nuclear loci data sets may provide the necessary analytical power to delineate the fine‐scale genetic structuring identified by SNPs. As always, as the field begins to fully embrace the newest thing, good science reminds us that traditional data types are far from useless, especially when combined with a well‐designed sampling scheme. 相似文献

13.

Preparing a re-sequencing DNA library of 2 cancer candidate genes using the ligation-by-amplification protocol by two PCR reactions

YeYang Su Lin Lin Geng Tian Chen Chen Tao Liu Xingya Xu XinPeng Qi XiuQing Zhang HuanMing Yang 《中国科学：生命科学英文版》2009,52(5):483-491

To meet the needs of large-scale genomic/genetic studies, the next-generation massively parallelized sequencing technologies provide high throughput, low cost and low labor-intensive sequencing service, with subsequent bioinformatic software and laboratory methods developed to expand their applications in various types of research. PCR-based genomic/genetic studies, which have significant usage in association studies like cancer research, haven’t benefited much from those next-generation sequencing technolo... 相似文献

14.

Population genomic analyses from low‐coverage RAD‐Seq data: a case study on the non‐model cucurbit bottle gourd

Pei Xu Shizhong Xu Xiaohua Wu Ye Tao Baogen Wang Sha Wang Dehui Qin Zhongfu Lu Guojing Li 《The Plant journal : for cell and molecular biology》2014,77(3):430-442

Restriction site‐associated DNA sequencing (RAD‐Seq), a next‐generation sequencing‐based genome ‘complexity reduction’ protocol, has been useful in population genomics in species with a reference genome. However, the application of this protocol to natural populations of genomically underinvestigated species, particularly under low‐to‐medium sequencing depth, has not been well justified. In this study, a Bayesian method was developed for calling genotypes from an F₂ population of bottle gourd [Lagenaria siceraria (Mol.) Standl.] to construct a high‐density genetic map. Low‐depth genome shotgun sequencing allowed the assembly of scaffolds/contigs comprising approximately 50% of the estimated genome, of which 922 were anchored for identifying syntenic regions between species. RAD‐Seq genotyping of a natural population comprising 80 accessions identified 3226 single nuclear polymorphisms (SNPs), based on which two sub‐gene pools were suggested for association with fruit shape. The two sub‐gene pools were moderately differentiated, as reflected by the Hudson's F_ST value of 0.14, and they represent regions on LG7 with strikingly elevated F_ST values. Seven‐fold reduction in heterozygosity and two times increase in LD (r²) were observed in the same region for the round‐fruited sub‐gene pool. Outlier test suggested the locus LX3405 on LG7 to be a candidate site under selection. Comparative genomic analysis revealed that the cucumber genome region syntenic to the high F_ST island on LG7 harbors an ortholog of the tomato fruit shape gene OVATE. Our results point to a bright future of applying RAD‐Seq to population genomic studies for non‐model species even under low‐to‐medium sequencing efforts. The genomic resources provide valuable information for cucurbit genome research. 相似文献

15.

吡虫啉降解菌群结构解析及菌株资源挖掘

排孜丽亚·帕尔哈提车娟阿孜古力·库尔班张伟《微生物学杂志》2020,(5):26-34

新疆地区为控制棉花连作引发的棉蚜等害虫规模性爆发,长期大量使用了吡虫啉等农药,为获得适合当地气候土壤环境条件下,降解吡虫啉的微生物资源,以吡虫啉为唯一碳源从新疆棉花长期连作土壤中富集降解菌群,通过高通量测序分析其结构组成,利用多种常规培养基和通过高通量测序结果设计的培养基从吡虫啉降解菌群（BCL）中挖掘细菌资源。结果表明,菌群在门分类水平上主要由变形菌门（Proteobacteria,67.05%）;放线菌门（Actinobacteria,10.67%）、厚壁菌门（Firmicutes,9.99%）、绿弯菌门（Chloroflexi,4.1%）、酸杆菌门（Acidobacteria,2.84%）等组成;在属分类水平,占比最多的依次为Pseudomonas（16.33%）、Moraxella（11.14%）、Escherichia（4.57%）、Brochothrix（2.19%）等,未能分类的菌属占58.13%。通过基础无机盐等常规培养基分离出BP2、BP5、BP8、BG5、BJ7、BJ17、BJW8等48株菌株,分别隶属于为Shinella、Nocardioides、Agromyces、Sphingopyxis、Bacillus、Cellulosimicrobium、Bosea等属。经Trace element solution和 Vitamin solution培养基分离获得了BMV3、BMV5、BMV7.1、BMV14.1、BLE1、BLE3.1、BLE4、BLE5、BLE10.1等25株菌株,分别隶属于Paenibacillus、Ammoniphilus、Planococcus、Brevibacillus、 Paenibacillus、Bacillus、Sphingopyxis、Rhodococcus、Nocardioides等属。使用16S rRNA序列对比分析,BP5与Nocardioides nitrophenolicus NSP 41同源性达98.08%,BMV5与Ammoniphilus resinae CC-RT-E同源性达98.5%,其他菌株同源性介于98.29%~100%之间。可见依据高通量测序结果设计培养基,可以有针对性地从源自新疆棉区土壤的吡虫啉降解菌群中分离出一些低丰度菌属。在摸索分离新疆特殊环境微生物资源的同时也能为该地区生物修复农药污染土壤提供参考。相似文献

16.

New approaches to <Emphasis Type="Italic">Prunus</Emphasis> transcriptome analysis

Martínez-Gómez P Crisosto CH Bonghi C Rubio M 《Genetica》2011,139(6):755-769

相似文献

17.

Towards the era of comparative evolutionary genomics in Brassicaceae

M. A. Lysak C. Lexer 《Plant Systematics and Evolution》2006,259(2-4):175-198

The vast genetic diversity, specific genome organization and sequencing of the Arabidopsis thaliana genome made crucifers an ideal group for comparative genomic studies. Arabidopsis genomic resources have greatly expedited comparative genomics within Brassicaceae and fostered the establishment of new Arabidopsis relative model systems (ARMS). The extent of genome colinearity, modes and evolutionary rates of genome alterations are being analyzed by genetic mapping with ever increasing levels of precision. Comparative cytogenetic studies in Brassicaceae are employing various chromosome landmarks and cytogenetic techniques, including localization of rDNA, variation in centromeric satellite repeats, genomic in situ hybridization (GISH), fluorescence ISH using bacterial artificial chromosomes (BAC FISH), and large-scale comparative chromosome painting. Some genome alterations may represent rare genomic changes (RGCs) and thus have the potential to resolve complex/conflicting phylogenetic relationships inferred from DNA sequencing. Comparative genomics should increasingly be integrated with molecular phylogenetics and population genetics to elucidate the processes responsible for genetic variation in Brassicaceae. 相似文献

18.

De novo assembly of the transcriptome of an invasive snail and its multiple ecological applications

J. Sun M. Wang H. Wang H. Zhang X. Zhang V. Thiyagarajan P. Y. Qian J. W. Qiu 《Molecular ecology resources》2012,12(6):1133-1144

相似文献

19.

The simple fool's guide to population genomics via RNA‐Seq: an introduction to high‐throughput sequencing data analysis

Jason T. Ladner Daniel J. Barshis François Seneca Hannah Jaris Nina Overgaard Therkildsen Megan Morikawa Stephen R. Palumbi 《Molecular ecology resources》2012,12(6):1058-1067

相似文献

20.

PlasmoSEP: Predicting surface‐exposed proteins on the malaria parasite using semisupervised self‐training and expert‐annotated data

下载免费PDF全文

Yasser El‐Manzalawy Elyse E. Munoz Scott E. Lindner Vasant Honavar 《Proteomics》2016,16(23):2967-2976

Accurate and comprehensive identification of surface‐exposed proteins (SEPs) in parasites is a key step in developing novel subunit vaccines. However, the reliability of MS‐based high‐throughput methods for proteome‐wide mapping of SEPs continues to be limited due to high rates of false positives (i.e., proteins mistakenly identified as surface exposed) as well as false negatives (i.e., SEPs not detected due to low expression or other technical limitations). We propose a framework called PlasmoSEP for the reliable identification of SEPs using a novel semisupervised learning algorithm that combines SEPs identified by high‐throughput experiments and expert annotation of high‐throughput data to augment labeled data for training a predictive model. Our experiments using high‐throughput data from the Plasmodium falciparum surface‐exposed proteome provide several novel high‐confidence predictions of SEPs in P. falciparum and also confirm expert annotations for several others. Furthermore, PlasmoSEP predicts that 25 of 37 experimentally identified SEPs in Plasmodium yoelii salivary gland sporozoites are likely to be SEPs. Finally, PlasmoSEP predicts several novel SEPs in P. yoelii and Plasmodium vivax malaria parasites that can be validated for further vaccine studies. Our computational framework can be easily adapted to improve the interpretation of data from high‐throughput studies. 相似文献