Affiliation: | (1) Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, CB10 1SA, UK;(2) CRUK Cambridge Research Institute, Robinson Way, Cambridge, CB2 0RE, UK;(3) School of Surgery and Pathology, University of Western Australia, Nedlands, 6009, WA, Australia;(4) The Wellcome Trust/Cancer Research UK Gurdon Institute, The Henry Wellcome Building of Cancer and Developmental Biology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK;(5) National Cancer Institute, P.O. Box B., 567/206, Frederick, MD 21702, USA;(6) Children’s Hospital Oakland Research Institute, Oakland, CA 94609-1673, USA;(7) Alberta Diabetes Institute (ADI), Department of Medical Microbiology and Immunology, Division of Dermatology and Cutaneous Sciences, University of Alberta, Edmonton, AB T6G 2H7, Canada;(8) Department of Clinical Neurosciences, University of Cambridge, Addenbrooke’s Hospital, Hills Road, Cambridge, CB2 2QQ, UK;(9) Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Addenbrooke’s Hospital, Cambridge, CB2 0XY, UK;(10) Department of Pathology, Immunology Division, University of Cambridge, Cambridge, CB2 1QP, UK;(11) Present address: UCL Cancer Institute, University College London, 72 Huntley Street, London, WC1E 6BD, UK |
Abstract: | The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine. Horton and Gibson contributed equally to this work. |