期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Conformational dynamics of nonsynonymous variants at protein interfaces reveals disease association

Sudhir Kumar S. Banu Ozkan 《Proteins》2015,83(3):428-435

Recent studies have shown that the protein interface sites between individual monomeric units in biological assemblies are enriched in disease‐associated non‐synonymous single nucleotide variants (nsSNVs). To elucidate the mechanistic underpinning of this observation, we investigated the conformational dynamic properties of protein interface sites through a site‐specific structural dynamic flexibility metric (dfi) for 333 multimeric protein assemblies. dfi measures the dynamic resilience of a single residue to perturbations that occurred in the rest of the protein structure and identifies sites contributing the most to functionally critical dynamics. Analysis of dfi profiles of over a thousand positions harboring variation revealed that amino acid residues at interfaces have lower average dfi (31%) than those present at non‐interfaces (50%), which means that protein interfaces have less dynamic flexibility. Interestingly, interface sites with disease‐associated nsSNVs have significantly lower average dfi (23%) as compared to those of neutral nsSNVs (42%), which directly relates structural dynamics to functional importance. We found that less conserved interface positions show much lower dfi for disease nsSNVs as compared to neutral nsSNVs. In this case, dfi is better as compared to the accessible surface area metric, which is based on the static protein structure. Overall, our proteome‐wide conformational dynamic analysis indicates that certain interface sites play a critical role in functionally related dynamics (i.e., those with low dfi values), therefore mutations at those sites are more likely to be associated with disease. Proteins 2015; 83:428–435. © 2014 Wiley Periodicals, Inc. 相似文献

2.

From Single Variants to Protein Cascades: MULTISCALE MODELING OF SINGLE NUCLEOTIDE VARIANT SETS IN GENETIC DISORDERS*

Sabine C. Mueller Bj?rn Sommer Christina Backes Jan Haas Benjamin Meder Eckart Meese Andreas Keller 《The Journal of biological chemistry》2016,291(4):1582-1590

Understanding the role of genetics in disease has become a central part of medical research. Non-synonymous single nucleotide variants (nsSNVs) in coding regions of human genes frequently lead to pathological phenotypes. Beyond single variations, the individual combination of nsSNVs may add to pathogenic processes. We developed a multiscale pipeline to systematically analyze the existence of quantitative effects of multiple nsSNVs and gene combinations in single individuals on pathogenicity. Based on this pipeline, we detected in a data set of 842 nsSNVs discovered in 76 genes related to cardiomyopathies, associated nsSNV combinations in seven genes present in at least 70% of all 639 patient samples, but not in a control cohort of healthy humans. Structural analyses of these revealed primarily an influence on the protein stability. For amino acid substitutions located at the protein surface, we generally observed a proximity to putative binding pockets. To computationally analyze cumulative effects and their impact, pathogenicity methods are currently being developed. Our approach supports this process, as shown on the example of a cardiac phenotype but can be likewise applied to other diseases such as cancer. 相似文献

3.

Proteome-wide analysis of single-nucleotide variations in the N-glycosylation sequon of human genes

Mazumder R Morampudi KS Motwani M Vasudevan S Goldman R 《PloS one》2012,7(5):e36212

N-linked glycosylation is one of the most frequent post-translational modifications of proteins with a profound impact on their biological function. Besides other functions, N-linked glycosylation assists in protein folding, determines protein orientation at the cell surface, or protects proteins from proteases. The N-linked glycans attach to asparagines in the sequence context Asn-X-Ser/Thr, where X is any amino acid except proline. Any variation (e.g. non-synonymous single nucleotide polymorphism or mutation) that abolishes the N-glycosylation sequence motif will lead to the loss of a glycosylation site. On the other hand, variations causing a substitution that creates a new N-glycosylation sequence motif can result in the gain of glycosylation. Although the general importance of glycosylation is well known and acknowledged, the effect of variation on the actual glycoproteome of an organism is still mostly unknown. In this study, we focus on a comprehensive analysis of non-synonymous single nucleotide variations (nsSNV) that lead to either loss or gain of the N-glycosylation motif. We find that 1091 proteins have modified N-glycosylation sequons due to nsSNVs in the genome. Based on analysis of proteins that have a solved 3D structure at the site of variation, we find that 48% of the variations that lead to changes in glycosylation sites occur at the loop and bend regions of the proteins. Pathway and function enrichment analysis show that a significant number of proteins that gained or lost the glycosylation motif are involved in kinase activity, immune response, and blood coagulation. A structure-function analysis of a blood coagulation protein, antithrombin III and a protease, cathepsin D, showcases how a comprehensive study followed by structural analysis can help better understand the functional impact of the nsSNVs. 相似文献

4.

Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies

Miao-Xin Li Johnny S. H. Kwan Su-Ying Bao Wanling Yang Shu-Leong Ho Yong-Qiang Song Pak C. Sham 《PLoS genetics》2013,9(1)

Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ∼22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases. 相似文献

5.

SNVDis: A Proteome-wide Analysis Service for Evaluating nsSNVs in Protein Functional Sites and Pathways

Konstantinos Karagiannis Vahan Simonyan Raja Mazumder 《基因组蛋白质组与生物信息学报(英文版)》2013,11(2):122-126

Amino acid changes due to non-synonymous variation are included as annotations for individual proteins in UniProtKB/Swiss-Prot and RefSeq which present biological data in a protein-or gene-centric fashion. Unfortunately, proteome-wide analysis of non-synonymous singlenucleotide variations (nsSNVs) is not easy to perform because information on nsSNVs and functionally important sites are not well integrated both within and between databases and their search engines. We have developed SNVDis that allows evaluation of proteome-wide nsSNV distribution in functional sites, domains and pathways. More specifically, we have integrated human-specific data from major variation databases (UniProtKB, dbSNP and COSMIC), comprehensive sequence feature annotation from UniProtKB, Pfam, RefSeq, Conserved Domain Database (CDD) and pathway information from Protein ANalysis THrough Evolutionary Relationships (PANTHER) and mapped all of them in a uniform and comprehensive way to the human reference proteome provided by UniProtKB/Swiss-Prot. Integrated information of active sites, pathways, binding sites, domains, which are extracted from a number of different sources, provides a detailed overview of how nsSNVs are distributed over the human proteome and pathways and how they intersect with functional sites of proteins. Additionally, it is possible to find out whether there is an over-or under-representation of nsSNVs in specific domains, pathways or user-defined protein lists. The underlying datasets are updated once every 3 months. SNVDis is freely available at http://hive.biochemistry.gwu.edu/tool/snvdis. 相似文献

6.

Human germline and pan-cancer variomes and their distinct functional profiles

Yang Pan Konstantinos Karagiannis Haichen Zhang Hayley Dingerdissen Amirhossein Shamsaddini Quan Wan Vahan Simonyan Raja Mazumder 《Nucleic acids research》2014,42(18):11570-11588

Identification of non-synonymous single nucleotide variations (nsSNVs) has exponentially increased due to advances in Next-Generation Sequencing technologies. The functional impacts of these variations have been difficult to ascertain because the corresponding knowledge about sequence functional sites is quite fragmented. It is clear that mapping of variations to sequence functional features can help us better understand the pathophysiological role of variations. In this study, we investigated the effect of nsSNVs on more than 17 common types of post-translational modification (PTM) sites, active sites and binding sites. Out of 1 705 285 distinct nsSNVs on 259 216 functional sites we identified 38 549 variations that significantly affect 10 major functional sites. Furthermore, we found distinct patterns of site disruptions due to germline and somatic nsSNVs. Pan-cancer analysis across 12 different cancer types led to the identification of 51 genes with 106 nsSNV affected functional sites found in 3 or more cancer types. 13 of the 51 genes overlap with previously identified Significantly Mutated Genes (Nature. 2013 Oct 17;502(7471)). 62 mutations in these 13 genes affecting functional sites such as DNA, ATP binding and various PTM sites occur across several cancers and can be prioritized for additional validation and investigations. 相似文献

7.

Screening of human SNP database identifies recoding sites of A-to-I RNA editing

Gommans WM Tatalias NE Sie CP Dupuis D Vendetti N Smith L Kaushal R Maas S 《RNA (New York, N.Y.)》2008,14(10):2074-2085

相似文献

8.

RNA-Seq Approach for Genetic Improvement of Meat Quality in Pig and Evolutionary Insight into the Substrate Specificity of Animal Carbonyl Reductases

WY Jung SG Kwon M Son ES Cho Y Lee JH Kim BW Kim da H Park JH Hwang TW Kim HC Park BY Park JS Choi KK Cho KH Chung YM Song IS Kim SK Jin DH Kim SW Lee KW Lee WY Bang CW Kim 《PloS one》2012,7(9):e42198

相似文献

9.

Identification and analysis of deleterious human SNPs

Yue P Moult J 《Journal of molecular biology》2006,356(5):1263-1274

We have developed two methods of identifying which non-synonomous single base changes have a deleterious effect on protein function in vivo. One method, described elsewhere, analyzes the effect of the resulting amino acid change on protein stability, utilizing structural information. The other method, introduced here, makes use of the conservation and type of residues observed at a base change position within a protein family. A machine learning technique, the support vector machine, is trained on single amino acid changes that cause monogenic disease, with a control set of amino acid changes fixed between species. Both methods are used to identify deleterious single nucleotide polymorphisms (SNPs) in the human population. After carefully controlling for errors, we find that approximately one quarter of known non-synonymous SNPs are deleterious by these criteria, providing a set of possible contributors to human complex disease traits. 相似文献

10.

遗传风险评分在复杂疾病遗传学研究中的应用

牛大彦严卫丽《遗传》2015,37(12):1204-1210

心血管疾病、2型糖尿病、原发性高血压、哮喘、肥胖、肿瘤等复杂疾病在全球范围内流行,并成为人类死亡的主要原因。越来越多的人开始关注遗传易感性在复杂疾病发病机制中的作用。至今,与复杂疾病相关的易感基因和基因序列变异仍未完全清楚。人们希望通过遗传关联研究来阐明复杂疾病的遗传基础。近年来,全基因组关联研究和候选基因研究发现了大量与复杂疾病有关的基因序列变异。这些与复杂疾病有因果和(或)关联关系的基因序列变异的发现促进了复杂疾病预测和防治方法的产生和发展。遗传风险评分(Genetic risk score,GRS)作为探索单核苷酸多态(Single nucleotide polymorphisms,SNPs)与复杂疾病临床表型之间关系的新兴方法,综合了若干SNPs的微弱效应,使基因多态对疾病的预测性大幅度提升。该方法在许多复杂疾病遗传学研究中得到成功应用。本文重点介绍了GRS的计算方法和评价标准,简要列举了运用GRS取得的系列成果,并对运用过程中所存在的局限性进行了探讨,最后对遗传风险评分的未来发展方向进行了展望。相似文献

11.

Role for protein–protein interaction databases in human genetics

《Expert review of proteomics》2013,10(6):647-659

Proteomics and the study of protein–protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein–protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein–protein interactions in human genetics and genetic epidemiology. Since protein–protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies. 相似文献

12.

Cancer‐associated mutations are preferentially distributed in protein kinase functional sites

Jose M. G. Izarzugaza Oliver C. Redfern Christine A. Orengo Alfonso Valencia 《Proteins》2009,77(4):892-903

Protein kinases are a superfamily involved in many crucial cellular processes, including signal transmission and regulation of cell cycle. As a consequence of this role, kinases have been reported to be associated with many types of cancer and are considered as potential therapeutic targets. We analyzed the distribution of pathogenic somatic point mutations (drivers) in the protein kinase superfamily with respect to their location in the protein, such as in structural, evolutionary, and functionally relevant regions. We find these driver mutations are more clearly associated with key protein features than other somatic mutations (passengers) that have not been directly linked to tumor progression. This observation fits well with the expected implication of the alterations in protein kinase function in cancer pathogenicity. To explain the relevance of the detected association of cancer driver mutations at the molecular level in the human kinome, we compare these with genetically inherited mutations (SNPs). We find that the subset of nonsynonymous SNPs that are associated to disease, but sufficiently mild to the point of being widespread in the population, tend to avoid those key protein regions, where they could be more detrimental for protein function. This tendency contrasts with the one detected for cancer associated‐driver‐mutations, which seems to be more directly implicated in the alteration of protein function. The detailed analysis of protein kinase groups and a number of relevant examples, confirm the relation between cancer associated‐driver‐mutations and key regions for protein kinase structure and function. Proteins 2009. © 2009 Wiley‐Liss, Inc. 相似文献

13.

The first report of polymorphisms and genetic characteristics of the prion protein gene (PRNP) in horses

《朊病毒》2013,7(3-4):245-252

ABSTRACT

Prion diseases have a wide host range, but prion-infected cases have never been reported in horses. Genetic polymorphisms that can directly impact the structural stability of horse prion protein have not been investigated thus far. In addition, we noticed that previous studies focusing on horse-specific amino acids and secondary structure predictions of prion protein were performed for limited parts of the protein. In this study, we found genetic polymorphisms in the horse prion protein gene (PRNP) in 201 Thoroughbred horses. The identified polymorphism was assessed to determine whether this polymorphism impedes stability of protein using PolyPhen-2, PROVEAN and PANTHER. In addition, we evaluated horse-specific amino acids in horse and mouse prion proteins using same methods. We found only one single nucleotide polymorphism (SNP) in the horse prion protein, and three annotation tools predicted that the SNP is benign. In addition, horse-specific amino acids showed different effects on horse and mouse prion proteins, respectively.

Abbreviations: PRNP: prion protein gene; SNP: single nucleotide polymorphism; CJD: Creutzfeldt-Jakob disease; CWD: chronic wasting disease; TME: transmissible mink encephalopathy; FSE: feline spongiform encephalopathy; MD: molecular dynamics; ER: endoplasmic reticulum; GPI: glycosylphosphatidylinositol; NMR: nuclear magnetic resonance; ORF: open reading frame; GWAS: genome-wide association study; NAPA: non-adaptive prion amplification; HMM: hidden Markov model; NCBI: National Center for Biotechnology Information 相似文献

14.

Deep Profiling of Cellular Heterogeneity by Emerging Single‐Cell Proteomic Technologies

Liwei Yang Justin George Jun Wang 《Proteomics》2020,20(13)

The ability to comprehensively profile cellular heterogeneity in functional proteome is crucial in advancing the understanding of cell behavior, organism development, and disease mechanisms. Conventional bulk measurement by averaging the biological responses across a population often loses the information of cellular variations. Single‐cell proteomic technologies are becoming increasingly important to understand and discern cellular heterogeneity. The well‐established methods for single‐cell protein analysis based on flow cytometry and fluorescence microscopy are limited by the low multiplexing ability owing to the spectra overlap of fluorophores for labeling antibodies. Recent advances in mass spectrometry (MS), microchip, and reiterative staining‐based techniques for single‐cell proteomics have enabled the evaluation of cellular heterogeneity with high throughput, increased multiplexity, and improved sensitivity. In this review, the principles, developments, advantages, and limitations of these advanced technologies in analysis of single‐cell proteins, along with their biological applications to study cellular heterogeneity, are described. At last, the remaining challenges, possible strategies, and future opportunities that will facilitate the improvement and broad applications of single‐cell proteomic technologies in cell biology and medical research are discussed. 相似文献

15.

A chromosome‐scale reference genome and genome‐wide genetic variations elucidate adaptation in yak

Qiu‐mei Ji Jin‐wei Xin Zhi‐xin Chai Cheng‐fu Zhang Yangla Dawa Sang Luo Qiang Zhang Zhandui Pingcuo Min‐Sheng Peng Yong Zhu Han‐wen Cao Hui Wang Jian‐lin Han Jin‐cheng Zhong 《Molecular ecology resources》2021,21(1):201-211

Yak is an important livestock animal for the people indigenous to the harsh, oxygen‐limited Qinghai‐Tibetan Plateau and Hindu Kush ranges of the Himalayas. The yak genome was sequenced in 2012, but its assembly was fragmented because of the inherent limitations of the Illumina sequencing technology used to analyse it. An accurate and complete reference genome is essential for the study of genetic variations in this species. Long‐read sequences are more complete than their short‐read counterparts and have been successfully applied towards high‐quality genome assembly for various species. In this study, we present a high‐quality chromosome‐scale yak genome assembly (BosGru_PB_v1.0) constructed with long‐read sequencing and chromatin interaction technologies. Compared to an existing yak genome assembly (BosGru_v2.0), BosGru_PB_v1.0 shows substantially improved chromosome sequence continuity, reduced repetitive structure ambiguity, and gene model completeness. To characterize genetic variation in yak, we generated de novo genome assemblies based on Illumina short reads for seven recognized domestic yak breeds in Tibet and Sichuan and one wild yak from Hoh Xil. We compared these eight assemblies to the BosGru_PB_v1.0 genome, obtained a comprehensive map of yak genetic diversity at the whole‐genome level, and identified several protein‐coding genes absent from the BosGru_PB_v1.0 assembly. Despite the genetic bottleneck experienced by wild yak, their diversity was nonetheless higher than that of domestic yak. Here, we identified breed‐specific sequences and genes by whole‐genome alignment, which may facilitate yak breed identification. 相似文献

16.

DynaMut2: Assessing changes in stability and flexibility upon single and multiple point missense mutations

Carlos H.M. Rodrigues Douglas E.V. Pires David B. Ascher 《Protein science : a publication of the Protein Society》2021,30(1):60-69

Predicting the effect of missense variations on protein stability and dynamics is important for understanding their role in diseases, and the link between protein structure and function. Approaches to estimate these changes have been proposed, but most only consider single‐point missense variants and a static state of the protein, with those that incorporate dynamics are computationally expensive. Here we present DynaMut2, a web server that combines Normal Mode Analysis (NMA) methods to capture protein motion and our graph‐based signatures to represent the wildtype environment to investigate the effects of single and multiple point mutations on protein stability and dynamics. DynaMut2 was able to accurately predict the effects of missense mutations on protein stability, achieving Pearson's correlation of up to 0.72 (RMSE: 1.02 kcal/mol) on a single point and 0.64 (RMSE: 1.80 kcal/mol) on multiple‐point missense mutations across 10‐fold cross‐validation and independent blind tests. For single‐point mutations, DynaMut2 achieved comparable performance with other methods when predicting variations in Gibbs Free Energy (ΔΔG) and in melting temperature (ΔT_m). We anticipate our tool to be a valuable suite for the study of protein flexibility analysis and the study of the role of variants in disease. DynaMut2 is freely available as a web server and API at http://biosig.unimelb.edu.au/dynamut2 . 相似文献

17.

TRAPPopathies: An emerging set of disorders linked to variations in the genes encoding transport protein particle (TRAPP)‐associated proteins

Michael Sacher Nassim Shahrzad Hiba Kamel Miroslav P. Milev 《Traffic (Copenhagen, Denmark)》2019,20(1):5-26

The movement of proteins between cellular compartments requires the orchestrated actions of many factors including Rab family GTPases, Soluble NSF Attachment protein REceptors (SNAREs) and so‐called tethering factors. One such tethering factor is called TRAnsport Protein Particle (TRAPP), and in humans, TRAPP proteins are distributed into two related complexes called TRAPP II and III. Although thought to act as a single unit within the complex, in the past few years it has become evident that some TRAPP proteins function independently of the complex. Consistent with this, variations in the genes encoding these proteins result in a spectrum of human diseases with diverse, but partially overlapping, phenotypes. This contrasts with other tethering factors such as COG, where variations in the genes that encode its subunits all result in an identical phenotype. In this review, we present an up‐to‐date summary of all the known disease‐related variations of genes encoding TRAPP‐associated proteins and the disorders linked to these variations which we now call TRAPPopathies. 相似文献

18.

High throughput genotyping technologies. 总被引：4，自引：0，他引：4

Andrew M Dearlove 《Briefings in Functional Genomics and Prot》2002,1(2):139-150

A comprehensive genetic map containing several hundred microsatellite markers resulted from a large microsatellite mapping project. This was the first real study that introduced high throughput methods to the genetic community. This map and the concurrent technological advances, which will briefly be reviewed, led to further numerous mapping investigations of simple and complex diseases. The annotated draft sequence of approximately three billion base pairs (bp) of the human genome has been completed much sooner than many imagined, due to considerable technological advancements and the international enterprise that resulted. This was a major development for the genetics community, but is only the precursor to the next phase of studying and understanding the variation within the human genome. The awareness of the differences may help us understand the effects on the genetics of the variation between individuals and disease. It is these variations at the nucleotide level that determine the physiological differences, or phenotypes of each individual, including all biological functions at the cellular and body level. Single nucleotide polymorphisms (SNPs) will provide the next high density map, and be the genetic tool to study these genetic variations. There are many sources of SNPs and exhaustive numbers of methods of SNP detection to be considered. The focus in this paper will be on the merits of selected, varied SNP typing methodologies that are emerging to genotype many individuals with the required huge number of SNPs to make the study of complex diseases and pharmacogenomics a practical and economically viable option. 相似文献

19.

The search for loci under selection: trends,biases and progress

下载免费PDF全文

Collin W. Ahrens Paul D. Rymer Adam Stow Jason Bragg Shannon Dillon Kate D. L. Umbers Rachael Y. Dudaniec 《Molecular ecology》2018,27(6):1342-1356

Detecting genetic variants under selection using F_ST outlier analysis (OA) and environmental association analyses (EAAs) are popular approaches that provide insight into the genetic basis of local adaptation. Despite the frequent use of OA and EAA approaches and their increasing attractiveness for detecting signatures of selection, their application to field‐based empirical data have not been synthesized. Here, we review 66 empirical studies that use Single Nucleotide Polymorphisms (SNPs) in OA and EAA. We report trends and biases across biological systems, sequencing methods, approaches, parameters, environmental variables and their influence on detecting signatures of selection. We found striking variability in both the use and reporting of environmental data and statistical parameters. For example, linkage disequilibrium among SNPs and numbers of unique SNP associations identified with EAA were rarely reported. The proportion of putatively adaptive SNPs detected varied widely among studies, and decreased with the number of SNPs analysed. We found that genomic sampling effort had a greater impact than biological sampling effort on the proportion of identified SNPs under selection. OA identified a higher proportion of outliers when more individuals were sampled, but this was not the case for EAA. To facilitate repeatability, interpretation and synthesis of studies detecting selection, we recommend that future studies consistently report geographical coordinates, environmental data, model parameters, linkage disequilibrium, and measures of genetic structure. Identifying standards for how OA and EAA studies are designed and reported will aid future transparency and comparability of SNP‐based selection studies and help to progress landscape and evolutionary genomics. 相似文献

20.

The evolution of phylogeographic data sets

下载免费PDF全文

Ryan C. Garrick Isabel A. S. Bonatelli Chaz Hyseni Ariadna Morales Tara A. Pelletier Manolo F. Perez Edwin Rice Jordan D. Satler Rebecca E. Symula Maria Tereza C. Thomé Bryan C. Carstens 《Molecular ecology》2015,24(6):1164-1171

Empirical phylogeographic studies have progressively sampled greater numbers of loci over time, in part motivated by theoretical papers showing that estimates of key demographic parameters improve as the number of loci increases. Recently, next‐generation sequencing has been applied to questions about organismal history, with the promise of revolutionizing the field. However, no systematic assessment of how phylogeographic data sets have changed over time with respect to overall size and information content has been performed. Here, we quantify the changing nature of these genetic data sets over the past 20 years, focusing on papers published in Molecular Ecology. We found that the number of independent loci, the total number of alleles sampled and the total number of single nucleotide polymorphisms (SNPs) per data set has improved over time, with particularly dramatic increases within the past 5 years. Interestingly, uniparentally inherited organellar markers (e.g. animal mitochondrial and plant chloroplast DNA) continue to represent an important component of phylogeographic data. Single‐species studies (cf. comparative studies) that focus on vertebrates (particularly fish and to some extent, birds) represent the gold standard of phylogeographic data collection. Based on the current trajectory seen in our survey data, forecast modelling indicates that the median number of SNPs per data set for studies published by the end of the year 2016 may approach ~20 000. This survey provides baseline information for understanding the evolution of phylogeographic data sets and underscores the fact that development of analytical methods for handling very large genetic data sets will be critical for facilitating growth of the field. 相似文献