Predicting Allergen Cross Reactions by Protein...

65
Predicting Allergen Cross Reactions by Protein Sequence Graduate School for Cellular and Biomedical Sciences University of Bern MD-PhD Thesis Submitted by Pascal Bruno Pfiffner from Mels SG and Mels-Weisstannen SG Thesis advisor Prof. Dr. Beda M Stadler University Institute of Immunology Medical Faculty of the University of Bern Original document saved on the web server of the University Library of Bern This work is licensed under a Creative Commons Attribution-Non-Commercial-No derivative works 2.5 Switzerland licence. To see the licence go to http://creativecommons.org/licenses/by-nc-nd/2.5/ch/ or write to Creative Commons, 171 Second Street, Suite 300, San Francisco, California 94105, USA.

Transcript of Predicting Allergen Cross Reactions by Protein...

  • Predicting Allergen Cross Reactionsby Protein Sequence

    Graduate School for Cellular and Biomedical Sciences

    University of Bern

    MD-PhD Thesis

    Submitted by

    Pascal Bruno Pfiffnerfrom Mels SG and Mels-Weisstannen SG

    Thesis advisor

    Prof. Dr. Beda M StadlerUniversity Institute of Immunology

    Medical Faculty of the University of Bern

    Original document saved on the web server of the University Library of Bern

    This work is licensed under aCreative Commons Attribution-Non-Commercial-No derivative works 2.5 Switzerland licence. To see the licence

    go to http://creativecommons.org/licenses/by-nc-nd/2.5/ch/ or write toCreative Commons, 171 Second Street, Suite 300, San Francisco, California 94105, USA.

  • Copyright Notice

    This document is licensed under the Creative Commons Attribution-Non-Commercial-No derivative works 2.5 Switzerland. http://creativecommons.org/licenses/by-nc-nd/2.5/ch/

    You are free:

    to copy, distribute, display, and perform the work Under the following conditions:

    Attribution. You must give the original author credit.

    Non-Commercial. You may not use this work for commercial purposes.

    No derivative works. You may not alter, transform, or build upon this work.. For any reuse or distribution, you must take clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author’s moral rights according to Swiss law. The detailed license agreement can be found at: http://creativecommons.org/licenses/by-nc-nd/2.5/ch/legalcode.de

  • Accepted by the Faculty of Medicine, the Faculty of Science and the

    Vetsuisse Faculty of the University of Bern at the request of the Graduate

    School for Cellular and Biomedical Sciences

    Bern, Dean of the Faculty of Medicine

    Bern, Dean of the Faculty of Science

    Bern, Dean of the Vetsuisse Faculty Bern

  • Contents

    1 Abstract 6

    2 Abbreviations 7

    3 Scientific Overview 93.1 Allergen Cross-Reactions . . . . . . . . . . . . . . . . . . . . . . . . 9

    3.1.1 Allergy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1.2 The Context of Cross-Reactions . . . . . . . . . . . . . . . . 93.1.3 The Molecular Basis . . . . . . . . . . . . . . . . . . . . . . 10

    3.2 Allergenicity Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.1 From Skin Testing to Laboratory Analysis . . . . . . . . . . 143.2.2 Quantifying Cross-Reactivity . . . . . . . . . . . . . . . . . 153.2.3 Allergy Array Test System . . . . . . . . . . . . . . . . . . . 16

    3.3 Allergenicity Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 163.3.1 Necessity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3.2 Epitope Focused Prediction . . . . . . . . . . . . . . . . . . 173.3.3 Structure-Sequence Relationship . . . . . . . . . . . . . . . . 183.3.4 Identifying Conserved Domains . . . . . . . . . . . . . . . . 183.3.5 Motifs and General Profiles . . . . . . . . . . . . . . . . . . 20

    3.4 Bioinformatics of Cross Reactions . . . . . . . . . . . . . . . . . . . 213.4.1 Motif Calculation . . . . . . . . . . . . . . . . . . . . . . . . 213.4.2 Web Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.5 Outlook: Protein Surface Comparison . . . . . . . . . . . . . . . . . 223.5.1 Ab initio Protein Folding Prediction . . . . . . . . . . . . . 223.5.2 Homology Modeling . . . . . . . . . . . . . . . . . . . . . . 243.5.3 Prediction of Similar Surfaces . . . . . . . . . . . . . . . . . 25

    4 Results – Dissertation Equivalents 354.1 Dissertation Equivalent I . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Dissertation Equivalent II . . . . . . . . . . . . . . . . . . . . . . . 45

    5 Acknowledgements 64

    4

  • A Declaration of Originality 65

    List of Figures

    3.1 Structural alignment of Bet v 1 and Mal d 1 . . . . . . . . . . . . . 123.2 Structures and folds newly added to the PDB . . . . . . . . . . . . 193.3 Iterative motif discovery . . . . . . . . . . . . . . . . . . . . . . . . 213.4 Web-based front-end . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    List of Tables

    3.1 Allergens homologous to pathogenesis related proteins . . . . . . . . 13

    LIST OF TABLES 5

  • 1Abstract

    Clinically important allergen cross-reactions such as the pollen-food syndromeshave been shown to originate from structural homology. Additionally, in the lasttwo years no new protein folds were discovered, implying that the universe ofunique protein folds may be almost complete. For allergy this may suggest that theimmune system also reacts adversely in a predictable way, namely by recognizinghomologous proteins once a sensitization is established. Thus we have appliedsequence-based computational homology prediction to assess the extent of cross-reactivity.

    In a first paper we have analyzed more than 5’000 serum samples, each testedfor specific IgE against multiple allergen extracts (ImmunoCAP R©). We foundthe degree of cross-reactivity to be astonishingly high. However, as specific IgEdeterminations were based on crude allergen extracts, we were unable to concludethat the observed cross-reactivity reliably depends on the allergen sequence.

    Thus in a second paper we utilized data obtained for specific IgE directed againsthighly purified natural or recombinant proteins on a new allergen chip system(ISAC R©). Thereby we assessed the sensitization pattern against 105 proteins inmore than 3’000 serum samples. Protein pairs predicted to cross-react, basedon computationally identified homology, co-reacted significantly more often thanprotein pairs without apparent homology. Additionally, we demonstrated thatthe allergen source, and therefore co-sensitization, was much less important thanprotein homology.

    We conclude that cross-reactivity is an important mechanism in the developmentof allergic diseases, more important than generally accepted. Allergy diagnosis andtreatment may benefit from the combination of allergen chip data, i.e. specific IgEvalues directed against purified and recombinant proteins, and computationallypredicted cross-reactivity. Finally, our continuous endeavor to assess the numberof structural motifs in allergens shows that not only protein folds, but also thenumber of allergen motifs may soon reach a plateau. Hence allergenicity predictionmay become as valid as wet lab testing for new and potential allergens.

    6

  • 2Abbreviations

    APC Antigen Presenting Cell

    Bet v 1 Betula verrucosa 1, major birch pollen allergen

    CASP critical Assessment of Techniques for Protein Structure Prediction

    CCD Cross-reacting Carbohydrate Determinant

    CDR Complementary Determining Region

    CRD Component-Resolved Diagnostics

    EVD Extreme Value Distribution

    FAO Food and Agriculture Organization of the United Nations

    FEIA Fluoro-Enzyme Immuno Assay

    GM Genetically Modified

    IDT Intradermal Dilutional Testing

    IgE Immunoglobulin E

    ISAC Immuno Solid-phase Allergen Chip

    Mal d 1 Malus domesticus 1, major apple allergen

    NMR Nuclear Magnetic Resonance

    PDB Protein Data Bank

    PR Pathogenesis-Related Protein

    PSSM Position-Specific Scoring Matrix

    PTM Post-Translational Modification

    RAST Radio Allergo-Sorbent Tests

    rmsd Root Mean Square Deviation

    7

  • SET Skin End Point Titration

    SPT Skin Prick Test

    TCR T cell receptor

    WHO World Health Organization

    2. Abbreviations 8

  • 3Scientific Overview

    3.1 Allergen Cross-Reactions

    3.1.1 Allergy

    Allergy is a hypersensitivity type I disorder. It is characterized by typical clinicalreactions such as hay fever, asthma, food allergies, eczema and urticaria againstusually harmless substances. These manifestations are mostly mediated by Im-munoglobulin E (IgE) antibodies, highly specific binding proteins produced byplasma cells. By recognizing a certain pattern on the surface of their antigen, theepitope, antibodies elicit various immune reactions against these molecules. Anti-bodies of the IgE class are also expressed in high quantities during infections withhelminths (Erb, 2007), but are the main culprit in allergic sensitizations leadingto type I hypersensitivity (Gould et al., 2003).

    The allergic reaction Antibodies of the IgE subclass are required for type Ihypersensitivity reactions (Kay, 2000). Symptoms occurring during an allergicresponse, like swelling and itching, are a result of mast cell and basophil degran-ulation. Mast cells and basophils carry Fc� receptors on their surface which areable to bind to the Fc portion of IgE antibodies. Cross-linking surface-bound IgEon high-affinity Fc� type I receptors triggers degranulation (Helm et al., 1988).The cells release mediators such as histamine and serotonine into the surroundingtissue (Metzger, 1991; Nadler et al., 2000), which causes the symptoms typical forallergy.

    3.1.2 The Context of Cross-Reactions

    The phenomenon of cross-reactivity has long been known. Reports from thenineteen-thirties mention evolutionary relationship as possible explanations for ob-

    9

  • served cross-reactivity, but state that “mere similarity would be sufficient” (Hookerand Boyd, 1934). After publications revealed that any form of gelatin, essen-tially a denatured protein fragment without fixed 3-D structure, was antigenic inman (Maurer, 1954), Gell and Benaceraf presumed that “any specificity whichit [the protein] has must therefore be resident in the amino-acid sequence of theprotein chain” (Gell and Benacerraf, 1959). In a series of publications, they as-sessed various aspects of cross-reactivity against native and denatured proteins.Interestingly, they found delayed type hypersensitivity skin reactions in guineapigs sensitized to native ovalbumin when challenged with denatured ovalbumin,and vice versa (Gell and Benacerraf, 1959). Still, the denatured proteins wereunable to elicit immediate type hypersensitivity reactions, meaning the antibod-ies recognizing the native protein were unable to recognize linear parts thereof.This suggested that structural epitopes are more important for antigen recogni-tion by antibodies than sequence features alone. The two even considered differentparts of the proteins to be independently antigenic, coining the term “antigenicmotifs” (Benacerraf and Gell, 1959).

    Allergic patients commonly react to more than a single allergen, true single positivesensitizations are very rare (cf. dissertation equivalent I). In which proportionthis observation is caused by multiple sensitizations and in which proportion bycross-reactions may be disputed. On one hand, a TH2 response to an allergen inthe TH2-tilted milieu of allergic patients facilitates the sensitization against con-currently present proteins, for example different proteins in a pollen grain. Inprinciple, T helper cells will also stimulate B cells that are reactive to a non-cross-reactive epitope. On the other hand it can be demonstrated that co-sensitizationbetween proteins occurring in the same material is astonishingly low (cf. disser-tation equivalent II).

    Cross-reactivity may not be sufficiently explained by predicting IgE cross-reactivityalone. Factors other than the protein itself, most importantly the type and time ofexposure, are also crucial to sensitization (Ferreira et al., 2004). Yet these factorscan neither be influenced nor measured for diagnostic purposes. Thus the questionremains what diagnostic and therapeutic value IgE cross-reactivity prediction canadd to the current allergy evaluation procedure.

    3.1.3 The Molecular Basis

    The immune system recognizes allergen antigens in two different forms:

    • Antigen presenting cells (APCs) present peptide fragments of the digestedantigens on MHC class II molecules cells as linear structures. T helper cellsrequire this form of antigen presentation in order to elicit effector functions.

    • Antibodies and therewith B cell receptors recognize conformational epitopeson the tertiary structure of the folded antigen.

    3. Scientific Overview 10

  • Ultimately, the activation of B cells resulting in antibody production requires bothforms of recognition, as most allergens are proteins and thus thymus-dependentantigens. There is no B cell activation without help of T helper cells. For cross-reactivity the question arises which presentation leads to the generation of a cross-reactive immune response.

    T cell cross-reactivity Peptides presented on MHC class II molecules are usu-ally 10 - 25 amino acids long. Of this fragment, only a limited number of residues isdirectly interfaced by the T cell receptor (TCR) (Wucherpfennig and Strominger,1995). By contrast, a high affinity antibody can form around fifteen to twentybonds with its antigen (Davies et al., 1988; Lafont et al., 2007). Thus the T cellis, already for statistical reasons, more likely to encounter indistinguishable struc-tures originating from different proteins when compared to the B cell. It can bedemonstrated that some TCRs recognize not only a single peptide, but rather alimited repertoire of related peptides, derived from different antigens. This cross-recognition leads to efficient activation of the T cell, for example in the setting ofautoimmunity linked to viral antigens such as Multiple Sclerosis (Wucherpfennigand Strominger, 1995). Additionally, during the physiological process of positiveselection in the thymus, cross-reactivity at the T cell level is a common phe-nomenon. T cells with low affinity for self-MHC molecules are kept alive (positiveselection) while T cells with higher affinity for the self-MHC/self-peptide complexare deleted (negative selection) (Kappler et al., 1987; Kisielow et al., 1988).These findings demonstrate that cross-reactivity at the T cell level is common. Anactivated T helper cell potentially activates B cells presenting a range of differentpeptides, hence T helper cells do not limit allergen cross-reactivity.

    Cross-reactive antibodies Cross-reactive antibodies have the ability to bindan antigen different from its immunogen. Given the high specificity antibodiesachieve during affinity maturation, the existence of antibodies recognizing struc-tures different from their template structure seems surprising. A way out of thiscatch is looking at antigen specificity as a quantitative rather than qualitativeconcept. Antigen-antibody interactions are based on physicochemical processesdependent on spatial and electrostatic properties of both molecules’ surfaces. Inthis context, a perfect match in the sense of spatially and electrostatically per-fectly complementary molecules, reminiscent of the key-lock analogue, would re-sult in maximum affine binding. However, less-than-perfect surface pairs wouldstill be able to bind, admittedly with lower affinity. An antibody may thereforebe expected to bind various related and unrelated molecules given a high enoughsimilarity. The quality of such an interaction would only differ quantitatively (punintended) in thermodynamic properties, such as interaction rates or dissociationconstants. This mechanism is commonly termed “molecular mimicry”. Thus, fromthis stereochemical standpoint, molecular mimicry and therewith structural simi-larity between proteins build the foundation for cross-reactivity on the level of the

    3. Scientific Overview 11

  • antibody.

    The need for structural similarities for cross-reactivity at the antibody level is notonly of theoretical nature but can easily be demonstrated. Structural similarityas a consequence of phylogenetic inheritance correlates well with observed cross-reactivity. When examining clinically well known cross-reactions such as the apple-birch cross-reactivity, their major allergenic proteins Bet v 1 and Mal d 1 exhibitpotential epitopes for cross-reactive antibodies as identified by crystal structureand sequence analysis (Holm et al., 2001). Figure 3.1 demonstrates the similaritiesbetween the backbones of Bet v 11 and Mal d 12. Further examples are highlyconserved proteins such as taxonomically related group I grass pollen allergens,which demonstrate a high degree of cross-reactivity (Laffer et al., 1994, 1996).Thus also experimentally, phylogenetic relationship and with it conserved proteindomains exhibiting structural similarity are a main cause for cross-reactivity.

    Figure 3.1: Cartoon models of a structural alignment of Bet v 1 (purple, PDB accession number1B6F) and Mal d 1 (green, modeled). Mal d 1 has been modeled after Pru av 1 (PDB accessionnumber 1E09). Orientation in view B is perpendicular to the Y-axis of view A.

    Profilins and pathogenesis related proteins Following the early investiga-tions in the apple-birch cross-reactivity, it became clear that even much more dis-tantly related proteins were capable of eliciting cross-reactions. For the apple-birchcross-reactivity, the proline binding protein, profilin, was identified as causative al-lergen (Valenta et al., 1992). This protein turned out to be the most importantcross-reacting allergen discovered thus far. The profilin family constitutes a typeof pan-allergens sharing IgE epitopes present in most cells of eukaryotic organ-isms (Valenta et al., 1992). Nowadays, there are many well-known cross-reactionswhich can be allotted to omnipresent, evolutionary conserved pan-allergens. Theseinclude profilins, α-Amylase inhibitors, peroxidases, thiol-proteases, seed storageproteins and lectins (Breiteneder and Ebner, 2000).

    1PDB Model 1B6F: http://www.pdb.org/pdb/explore/explore.do?structureId=1e092Protein model based on template 1E09 chain A: http://www.proteinmodelportal.org/

    ?pid=modelDetail&pmpuid=1000000075750

    3. Scientific Overview 12

    http://www.pdb.org/pdb/explore/explore.do?structureId=1e09http://www.proteinmodelportal.org/?pid=modelDetail&pmpuid=1000000075750http://www.proteinmodelportal.org/?pid=modelDetail&pmpuid=1000000075750

  • Additionally to profilins, a high homology between pathogenesis-related proteins(PRs) of different plant species has been found (Midoro-Horiuti et al., 2001). Theseare plant proteins induced by viral infections of the plant and various other en-vironmental stresses (Surplus et al., 1998). The high homology between PRsexplains the high frequency of cross-reactivity among plant allergens. Examplesof allergens homologous to PRs can be found in table 3.1.

    PR Classification Example allergens

    PR-2 β-1,3-Glucanases Banana, latex, potato, tomatoPR-3 Basic chitinases Avocado, banana, chestnut, latexPR-4 Win-like proteins Elderberry, turnipPR-5 Thaumatin-like proteins Apple, bell pepper, cherry, kiwi,

    mountain cedarPR-10 Bet v 1 homologs Apple, apricot, carrot, celery,

    cherry, parsley, pear, potatoPR-14 Lipid transfer proteis Apple, barley, peach, soybean

    Table 3.1: Examples of allergens homologous to pathogenesis related proteins

    Hydrophobic Stickiness It has been suggested that antibodies may be cross-reactive due to hydrophobic stickiness, a nonspecific hydrophobic interaction (Pad-lan, 1994). Additionally, antibodies have been demonstrated to bind a range ofantigens directly related to their hydrophobicity (Barbas et al., 1997). However,such nonspecific binding can not explain the high specificity with which antibod-ies are known to interact with their antigen. Furthermore, no correlation betweenhydrophobicity and affinity has been found in recent studies (James and Tawfik,2003), thus hydrophobic stickiness may contribute to cross-reactions, but is nottheir basis.

    Post-Translational Modification An aspect easily forgotten is that transla-tion is not the final step in the formation of a protein from DNA. Post-translationalmodifications (PTM) are processes not entirely defined by the DNA sequence, butinstead determined by factors of the host. A broad range of PTMs has beendescribed. For example, these mechanisms are able to add functional groups oreven entire proteins by gamma-carboxylation, change the chemical nature of aminoacids by citrullination or induce structural changes, most notably by disulfide bondformation.

    A PTM leading to an extraordinary broad cross-reactivity through anti-carbo-hydrate responses is glycosylation, creating cross-reacting carbohydrate determi-nants (CCDs) on proteins from different sources. CCDs are common in plantallergens (pollen as well as food) and in Hymenoptera venoms. Even though aclinical effect of CCDs has been suggested (Fötisch et al., 1999), it is the general

    3. Scientific Overview 13

  • opinion that CCDs are not clinically relevant but must be considered when inter-preting in-vitro specific IgE assays, especially in pollen- and Hymenoptera venomsensitivity (Aalberse et al., 1981; Mari et al., 1999; Erzen et al., 2009). The clini-cal insignificance mostly stems from an inability to trigger mast cells or basophilsthrough receptor cross-linking. CCD structures are monoglycosylated as a conse-quence of their small size, and thus only represent monovalent epitopes (Viethset al., 2002), unable to establish the cross-linking.

    Cross-reactions require homology It seems clear that there is one main rea-son for cross-reactivity: Homology. There is no relevant cross-reactivity withoutstructural similarity (Aalberse et al., 2001). Cross-reactions without sequencesimilarity so far have only been demonstrated between anti-idiotypic antibodies.These antibodies do not have any similarity in the amino-acid sequences encodingtheir complementary determining region (CDR) (Lescar et al., 1995).With the exception of these antibodies, cross-reactive allergens without apparenthomology have not been demonstrated so far (Aalberse, 2005).

    3.2 Allergenicity Testing

    3.2.1 From Skin Testing to Laboratory Analysis

    Skin testing Since its inception by Blackley in 1873 (Blackley, 1873), skin pricktests (SPT) are still the most widely used clinical tests to assess sensitizationagainst a substance of interest (Neto and Rosário, 2009). Hypersensitivity typeI reactions can be provoked by pricking or injecting a minute amount of allergenintradermally, usually to a patient’s forearm. Comparing the size of the whealinduced by an allergen to the size of a control wheal (usually provoked by a salinesolution) allows to diagnose whether a patient is sensitized against the allergen,to a certain degree even the strength of the reaction. This allows for quick andreliable testing as it has a good negative predictive value (Sicherer and Sampson,2010). However, SPTs are impractical to assess the full range of sensitizations dueto the number of pricks a patient would have to endure.

    To not only demonstrate the presence of sensitization but also to quantitate itsstrength, intradermal dilutional testing (IDT) can be performed by applying var-ious dilutions of the antigen. As one early form of IDT, skin end point titration(SET) is widely used in the diagnosis and treatment of inhalant allergens. Itsefficacy in guiding desensitization immunotherapy however is only little supportedby controlled experimental data (Krouse and Mabry, 2003), despite clinical ex-pertise having shown its usefulness and effectiveness. SET is commonly used tofind a safe starting dose for immunotherapy. In the case of food allergies, SET isstill investigatory and is not typically used in the clinical setting, however holdspromise to become a realistic diagnostic choice (Tripodi et al., 2009).

    3. Scientific Overview 14

  • In vitro The presence of IgE antibodies specific against an allergen is necessarybut not sufficient to provoke allergic responses. In other words, sensitization doesnot necessarily imply clinical allergy. This, however, is a topic beyond the scopeof this thesis. Nevertheless, testing patients for the presence of specific IgE is animportant part of every thorough allergic assessment.

    Today, the fluoro-enzyme immuno assay (FEIA) is the most widely used specificIgE detection method. It has replaced the previously used radio allergo-sorbenttests (RAST). After adding a patient’s serum sample to the test capsule, specificIgE present in the serum binds to covalently coupled allergen preparations. Ananti-human IgE antibody mixture, fluorescently labelled, is then added and theresulting fluorescence is measured in a spectrophotometer (ImmunoCAP R© system,Phadia AB, Uppsala, Sweden). IgE quantities are expressed in kilo-units of antigenper liter (kUA/L), where 1 unit corresponds to 2.4 ng of IgE (Pastorello et al.,1995).

    The allergen preparations used in these assays are mixtures of proteins which areprepared from biological extracts and are known to be heterogeneous, often alsocontaining non-allergenic proteins (Chapman et al., 2000). They can even be con-taminated with allergens from different sources. For these reasons, the use ofhighly purified natural or even recombinant allergen proteins has been promoted.Recombinant allergen proteins are an attractive choice because their pure form pro-motes reproducibility and standardization (Hamilton, 2010) and allows to exactlydetermine against which proteins a patient is sensitized. The latest microarraychip technology utilizing recombinant proteins will be discussed in more detail insection 3.2.3.

    3.2.2 Quantifying Cross-Reactivity

    In a first study (Dissertation Equivalent I) we have utilized a large database ofspecific IgE values obtained by FEIA (ImmunoCAP R©) to evaluate the degree ofsensitization against various allergens and their relationship to cross-reactivity. Wefound that allergen cross-reactions might be much more common than generallyassumed. For some extracts we found that well over 80% of the patients testedpositive were also tested positive against extracts presumably cross-reacting withthe original extract. Furthermore, with an increasing number of extracts tested,the percentage of sera sensitized against only one single allergen extract decreasedfrom approximately 10% for sera tested against 10 to 20 extracts to 1.6% for seratested against at least 90 extracts. This suggests that the true number of singlepositive sera must be low and therefore the rate of co-sensitization and presumablythe rate of cross-reactivity must be high. We concluded that using allergen extractsfor cross-reactivity assessment might introduce a certain bias as an extract containsa number of allergenic and non-allergenic proteins. Therefore, an assessment atthe protein level using recombinant proteins would be desirable.

    3. Scientific Overview 15

  • 3.2.3 Allergy Array Test System

    The ability to clone and purify single proteins has recently opened the door tocomponent-resolved diagnostics (CRD) (Valenta et al., 1999; van Hage-Hamstenand Pauli, 2004). CRD has been commercialized in the form of a microarray chip,the Immuno Solid-phase Allergen Chip (ISAC R©) (Hiller et al., 2002). Firstly,CRD allows to identify the disease-eliciting protein and not only the extract po-tentially containing many different proteins. Secondly, CRD in the form of theISAC system allows to determine sensitivity against a broad panel of allergensin a single measurement. Currently available chips contain 103 different purifiedallergen molecules. Plans to further extend this number are made in pursuit ofoffering an allergen screening test covering the widest range of allergens possibleand necessary.

    By testing a patient’s serum against 103 purified proteins, the sensitization pat-tern exhibited allows to further study the relationship between co-sensitizationand cross-reaction patterns. In our second study (Dissertation Equivalent II) weanalyzed the sensitization pattern of 3’142 patients, determined by ISAC eval-uations. The focus of our analysis lied with the relationship between predictedcross-reactions and observed co-reactions, as described in section 3.3. We founda high correlation between predicted and observed reactions, which further vali-dates the use of probabilistic sequence motifs for allergenicity prediction of newproteins.

    3.3 Allergenicity Prediction

    3.3.1 Necessity

    Risk in biotechnology Allergenicity is one of the most frequently asked ques-tions in connection with the safety of genetically modified (GM) foods (FAO andWHO, 2001). Consequently, allergenicity assessment of GM foods is one of themost important parts of risk assessment in biotechnology, in line with evaluationof direct toxicological and nutritional effects. The importance of the allergenicityaspect has especially become clear after the inadvertent generation of an allergenicsoy plant by transfer of a brazil-nut allergen (Nordlee et al., 1996). As a result ofthis assessment, development of said soy plant was abandoned and the organismwas never introduced to the food chain.

    Naturally, the amino-acid sequences of GM foods are known. Hence the mostobvious choice to assess potential allergenicity is by sequence comparison to knownallergens. Significant similarity between transgenic and known allergen sequencewould predict the transgene to be allergenic itself or to cross-react with knownallergens. The question arises what constitutes a “significant similarity”.

    3. Scientific Overview 16

  • A Joint FAO (Food and Agriculture Organization of the United Nations) /WHO(World Health Organization) Expert Consultation on Foods Derived from Biotech-nology devised guidelines for allergenicity evaluation. According to these guide-lines, a novel protein is regarded allergenic if:

    a) it has an identity of at least six contiguous amino acids or

    b) more than 35% sequence similarity over a window of 80 amino acids

    compared to any known allergen3. However, this method proved to be of low preci-sion, predicting more than 40% of human proteins as allergens (Stadler and Stadler,2003). In the same study, a new sequence based method has been proposed, tobe introduced in section 3.4.1. Various alternative allergenicity prediction meth-ods utilizing different statistical models have been published in the years after (Liet al., 2004; Riaz et al., 2005; Thomas et al., 2005; Saha and Raghava, 2006; Zhanget al., 2006; Kong et al., 2007; Cui et al., 2007; Schein et al., 2007; Barrio et al.,2007; Tong and Tammi, 2008; Lim et al., 2008; Muh et al., 2009; Ivanciuc et al.,2009).

    Significance in standard allergy assessment Cross-reactivity has also sev-eral implications in clinical allergy assessment. Not only are cross-reactions in vitropotentially creating clinically non-significant results, as for example with CCDs,but determining the original sensitizing agent may be complicated due to truecross-reactions. Identification of the primary sensitizing allergen however is likelyto be relevant, because desensitization against the “true sensitizer” may cover thewidest spectrum of specificities (Aalberse et al., 2001) and is likely to relieve symp-toms provoked by other allergens as well (Asero, 1998). Additionally, the abilityto predict potential cross-reactions may alleviate the need for multiple allergentests. These predictions would probably comprise a broader range of allergensthan direct testing and would therefore also caution the patient about unforeseenallergic (cross-)reactions. Thus, allergen cross-reactivity prediction would consti-tute a valuable asset in clinical allergy diagnosis.

    3.3.2 Epitope Focused Prediction

    As mentioned above, T cell epitopes possess a certain potential in inducing cross-reactions. However, cross-reactivity at the T cell level is abundant and thereforenot the limiting step in cross-reactivity induction. The key point whether cross-reactivity occurs rather lies at the level of the antibody, therefore cross-reactivityprediction efforts should commence at the antibody level.

    Cross-reacting antibodies are able to recognize epitopes on different proteins, giventhat these epitopes are stereochemically similar enough. As we have seen, the sin-

    3Full report: http://www.who.int/foodsafety/publications/biotech/en/ec_jan2001.pdf

    3. Scientific Overview 17

    http://www.who.int/foodsafety/publications/biotech/en/ec_jan2001.pdfhttp://www.who.int/foodsafety/publications/biotech/en/ec_jan2001.pdf

  • gle most important cause for two proteins exhibiting similar epitopes is homologybased on evolutionary conservation. The prediction of homologous structures fromthe sequence level is a desirable goal because protein sequence data is widely avail-able. Thus the question arises whether homologous structures might be detectableat the sequence level.

    3.3.3 Structure-Sequence Relationship

    When Pascarella and Argos in 1991 grouped protein structures with similar main-chain fold, they found that proteins in the same group exhibited strong sequenceand functional similarities. Not only did this strongly imply their evolution froma common ancestor (Pascarella and Argos, 1992), their findings confirmed thatstructural similarities are reflected in sequence similarity.

    Current State of the PDB If similar sequences encode the same main-chainfold, then a large number of sequences has to encode for only a limited numberof folds. Indeed, when comparing the number of protein structures versus theamount of different folds in the Protein Data Bank (PDB), this reasoning seemsto hold true. During the first decade of this millennium (2000-2009) the numberof structures contained in the PDB increased from 9’749 to 57’613, an almost6 fold increase. In the same timeframe, the number of unique folds grew from622 to 1’393, a more than 2 fold increase after all. The yearly increase of newfolds however stagnated at around 100 new folds until 2006, dropped to 6 in2008 and since then, no new folds have been added (as of January 2011). Thisdeclining rate is documented in figure 3.2 where it can also be seen that thenumber of new structures characterized yearly keeps to increase. This observationroughly coincides with early predictions that the majority of proteins stems fromno more than a thousand different families (Chothia, 1992; Aloy and Russell, 2004),representing the structural building blocks of protein evolution.

    3.3.4 Identifying Conserved Domains

    A correlation between structural similarity and amino-acid sequence seems plausi-ble, as mentioned above, however measuring sequence similarity as percent identityin sequence alignments is too simplistic. As an illustration, proteins of the globinfamily have diversely evolved in different species, yet are still folded in the samegeneral 3-D pattern. The amino-acid sequences are only identical in very fewresidues, some globins differ from others in as many as 130 of the approximately150 positions (Dickerson and Geis, 1983). This means that the “globin fold“ isencoded in various amino-acid sequences barely reminiscent of each other. Thus,a more dedicated approach to assess structural from sequence similarity, termed“generalized profiles“, was introduced.

    3. Scientific Overview 18

  • 1990 1995 2000 2005 2010

    02500

    5000

    7500

    Structures and folds newly added to the PDB (1990 - 2010)

    year

    Num

    ber o

    f new

    stru

    ctur

    es

    050

    100

    150

    Num

    ber o

    f new

    fold

    s

    New structuresNew folds

    Figure 3.2: The number of newly added structures and folds to the PDB, per year. The numbersfor the years 1990-2010 have been extracted from data available at http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=molType-protein&seqid=100 (Yearly Growthof Protein Structures) and http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=fold-scop (Growth Of Unique Folds Per Year As Defined By SCOP (v1.75))

    Generalized Profiles Generalized profiles are a very sensitive method for de-tecting even distant protein relationships by sequence comparison (Gribskov et al.,1987). In order to identify homologous proteins, the query sequences are notqueried by a single sequence but a profile constructed from a family of related se-quences. The profiles themselves are derived from multiple alignments of an initialsequence pool and contain the following information:

    • The residues which are allowed at what position• The importance of the positions• Which positions allow insertions• Which positions may be dispensable

    As such, generalized profiles may describe common characteristics of even dis-tantly related protein sequences. Proteins which contain the desired motif can beidentified by comparing their sequence to the profile.

    3. Scientific Overview 19

    http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=molType-protein&seqid=100http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=molType-protein&seqid=100http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=fold-scophttp://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=fold-scop

  • 3.3.5 Motifs and General Profiles

    The profile information is stored in a position-specific scoring matrix (PSSM).This matrix can be used to search a sequence database for occurrences of themotif. As the name “scoring matrix” implies, comparing sequences to PSSMsreturns a match score, a quantitative rather than a qualitative estimation for therelationship between a protein sequence and the profile. In order to decide whethera query sequence contains a motif, the significance of the profile-sequence matchhas to be evaluated by defining threshold levels for the match score. Sequencesscoring above the match score threshold likely contain the motif and are thereforepredicted to be phylogenetically related.

    Obtaining relevant cut-off levels is a difficult task. We used an approach based onthe probability of finding a profile in nature, substituting ‘nature’ with a random-ized database. Our randomized database was created from Uniprot4 release 44.0by regional shuffling using a window of 20 amino-acid residues, thereby preservingsize, sequence length distribution and amino-acid composition (Pearson and Lip-man, 1988). This approach has been criticized for introducing significant bias suchas over-fitting and failure to reflect the natural processes of random nucleotide andamino acid replacement, thus using a database consisting of randomly selected butunshuffled sequences might be an alternative worth considering (Mitrophanov andBorodovsky, 2006).

    Scoring all amino acid sequences of the randomized database against the profilereturns an empirical score frequency distribution. By fitting an extreme value dis-tribution (EVD, Gumbel distribution) to these scores, an E-value can be deducedfor each score with the formula:

    E(x,A) = A× 10−R1−R2x

    where R1 =lnA

    N− λµ

    ln10and R2 =

    λ

    ln10

    The E-value associates an expected number of chance hits with each score, i.e.the number of hits with scores exceeding x in a database with A residues (Pagniand Jongeneel, 2001). N corresponds to the number of sequences in the database,λ and µ are characteristics of the EVD. The E-value parameters R1 and R2 canthen be used by profile search algorithms to return normalized scores5. Unlike rawscores, the normalized scores can be compared between different profiles and athreshold value separating significant from random matches can be defined.

    4Uniprot: http://www.uniprot.org/5pftools Nscore calculation: http://www.isrec.isb-sib.ch/profile/scoredoc.html

    3. Scientific Overview 20

    http://www.uniprot.org/http://www.isrec.isb-sib.ch/profile/scoredoc.html

  • 3.4 Bioinformatics of Cross Reactions

    3.4.1 Motif Calculation

    The theoretical background to allergen cross-reactivity prediction has been de-scribed in section 3.3.5. We have utilized an approach previously developed atour institute (Stadler and Stadler, 2003), which aimed at identifying potentiallyallergenic novel proteins in GM food. This approach uses MEME 6 (Bailey andElkan, 1994) and the pftools7 (Bucher et al., 1996) in an iterative fashion as shownin figure 3.3.

    remove matching sequences

    Motif discovery

    repeat with remaining sequences

    collect motifs

    Profile scalingAllergome database

    Motifs

    Run set Motif

    cd-hit

    Dataset

    MEME pftools

    Figure 3.3: Iterative motif discovery. Allergen sequences are downloaded from Allergome andclustered using cd-hit. MEME analyzes all sequences in the run set and identifies the mostsignificant motif. This motif is scaled against a randomized database using pfscale and storedas a PSSM. The run set is scanned using pfscan and all matching sequences are removed fromthe set. With the remaining sequences this cycle is repeated until no more significant motifs arefound.

    Three changes to the original approach have been applied:

    First, since the inception of the original approach, almost five times as many aller-gen sequences are now available. This increase in sequences also saw an increasein isoforms. Therefore we decided to cluster sequences with a an identity of 90%or more prior to submission to our iterative motif discovery using cd-hit8 (Li et al.,2001). This clustering not only reduced the required calculation time, which raisesexponentially with the number of sequences, but also prevents the generation ofmotifs consisting entirely of isoforms.

    Second, in order to allow several motifs per protein sequence, all protein sequences

    6MEME is available from http://meme.sdsc.edu/meme/7pftools are available from http://www.isrec.isb-sib.ch/ftp-server/pftools/8cd-hit is available from http://www.bioinformatics.org/cd-hit/

    3. Scientific Overview 21

    http://meme.sdsc.edu/meme/http://www.isrec.isb-sib.ch/ftp-server/pftools/http://www. bioinformatics.org/cd-hit/

  • are screened against all discovered motifs. In the original approach, only the motifdiscovered during the iterative process was assigned to a protein.

    Third, MEME was allowed to choose a variable motif length of 35 to 70 aminoacid residues (cf. dissertation equivalent II).

    3.4.2 Web Interface

    The work presented here requires data from different sources. PSSMs representingallergen motifs (1) are calculated from allergen sequences (2) retrieved from anonline database, Allergome9. These calculations are compared against wet labdata (3) obtained in the form of spreadsheet files. In order to efficiently workwith this diverse data, source material was processed and stored in a MySQL10

    database. A web-based front-end in the PHP11 and JavaScript programminglanguages was built in order to allow quick data lookups. It is publicly accessiblefrom our institute’s website: http://www.iib.unibe.ch/allergen/. Figure 3.4shows a screenshot of the front-end.

    3.5 Outlook: Protein Surface Comparison

    The rationale behind predicting cross-reactivity by sequence similarity mostlystems from the broad availability of sequence data. A prediction closer to naturewould be direct comparison of two proteins’ surfaces and consequentially judge thepossibility of a cross-reaction. After all, structure is evolutionary more conservedthan sequence (Holm and Sander, 1996). The low number of available structurescompared to the number of sequences made this approach unfeasible so far. How-ever, this proportion is changing over time. More and more protein structuresare being determined by X-ray or nuclear magnetic resonance (NMR) imaging.Additionally, structures not yet experimentally determined may be inferred fromab initio protein folding prediction and homology modeling with increasing relia-bility.

    3.5.1 Ab initio Protein Folding Prediction

    Ab initio protein folding prediction is the prediction of yet unknown protein struc-tures only from amino-acid sequences. Despite the vast amount of possible con-formations for each sequence, proteins generally fold into uniquely native states,their thermodynamically most stable conformation. The dihedral angles φ and ψmay each assume one of three stable positions, hence knowledge of the amino-acid

    9Allergome: http://www.allergome.org/10MySQL is available from http://www.mysql.com/downloads/mysql/11PHP is available from http://www.php.net/downloads.php

    3. Scientific Overview 22

    http://www.iib.unibe.ch/allergen/http://www.allergome.org/http://www.mysql.com/downloads/mysql/http://www.php.net/downloads.php

  • Figure 3.4: Screenshot of the web-based front-end. The website is built with Web 2.0 tech-nologies and allows to lookup allergen extracts, proteins and motifs via life-search. Furthermore,custom protein sequences can be checked for occurrences of allergen motifs.

    sequence is potentially sufficient to predict the native fold of a protein. The idea ofletting a computer test all possible conformations comes to mind. This computerwould choose the thermodynamically most favorable conformation, thus findinga protein’s native state would merely be a question of available computer time.However, an amino-acid sequence of 100 residues may hypothetically fold into 3198

    potential conformations (three states for each of the 99 φ and 99 ψ angles). Ifa protein was to fold into each of these conformations in order to find its nativestate, it would have to fold for a time period much longer than the age of ourknown universe, even if it would only use picoseconds per state (cf. the Levinthalparadox ). As of December 2010, the fastest supercomputer in the world can per-form 2’570 calculations per picosecond12. This machine would have to calculatefor more than 1071 years, even by oversimplifying one complete structure compar-ison to one clock cycle. It is evident that this brute force approach is not leastimpossible.

    When applying evolutionary information and stochastic methods to this approach,the calculations still require vast computational resources and so far have only been

    122.566 petaflops. Rank 1 in November 2010’s TOP500 list of the world’s most powerfulsupercomputers: http://www.top500.org/lists/2010/11

    3. Scientific Overview 23

    http://www.top500.org/lists/2010/11

  • carried out for tiny, fast-folding proteins. Nevertheless, there are promising ap-proaches emerging from this field, the computer cost however is still too expensivefor broad use of the technique (Zhang, 2008). As an example, folding a 36 residuealpha-helical protein to an accuracy of on average 1.7Å to 1.9Å away from the na-tive state uses a total simulation time of approximately 1000 CPU years (Zagrovicet al., 2002). Folding@Home is a promising approach utilizing distributed molecu-lar dynamics calculations, running on thousands of personal computers around theworld (Shirts and Pande, 2000). The Folding@Home team has published a wealthof papers providing insight not only into the potential of ab initio protein folding,but also into understanding the mechanisms of protein folding kinetics (Pande,2010). Still, even this approach is limited to predicting the structure of peptidesand small proteins.

    3.5.2 Homology Modeling

    A different approach to bioinformatic protein structure prediction is homologymodeling or template based modeling. Compared to ab initio prediction, homol-ogy modeling predicts the protein structure via comparison to a template struc-ture. Therefore, the existence of similar structures in the PDB is a necessityfor a successful prediction. Identifying and aligning the best template structure(termed threading or fold recognition) is the first important step towards a correctprediction. Not surprisingly, the most often used amongst the many threadingapproaches use sequence profile-profile alignments to identify phylogenetically re-lated structure templates (Skolnick et al., 2004; Jaroszewski et al., 2005). Zhangand Skolnick recently showed that high-quality full-length models can be builtfor all protein targets with an average root mean square deviation (rmsd) of 2.25Å (Zhang and Skolnick, 2005). This suggests that the structural universe of thecurrent PDB library is essentially complete for solving the protein structure prob-lem, at least for single-domain proteins.

    The protein folding prediction field has quite literally turned into a sport withdifferent research groups trying to best each other in predicting structures in abiannual large-scale experiment known as the Critical Assessment of Techniquesfor Protein Structure Prediction (CASP)13. The advances in the field already to-day offer the possibility to predict yet unknown protein structures from sequencewith an astonishing accuracy. For cross-reactivity prediction, homology modelingmay possibly provide protein structures even for novel proteins, which may subse-quently be used to seek surface epitopes. Using above-mentioned rmsd of 2.25 Åas a reference, the accuracy of the predicted structures is potentially high enoughfor the prediction of cross-reactive epitopes, given that the antibody-antigen bind-ing surface encompasses almost 1’000 Å2 (Davies et al., 1988; Braden and Poljak,1995). Therefore it seems feasible to substitute computationally predicted struc-tures for protein structures which have not yet been experimentally determined.

    13CASP: http://predictioncenter.org/

    3. Scientific Overview 24

    http://predictioncenter.org/

  • The number of available 3-D allergen structures may approach the number of non-isotypic allergen sequences, enabling high quality cross-reactivity prediction basedon tertiary structures and thereby protein surfaces.

    3.5.3 Prediction of Similar Surfaces

    Predicting the fold of a protein however is only half the story. After the generationof the tertiary structure, the proteins’ B-cell epitopes have to be identified andsubsequently, these epitopes have to be compared in order to identify cross-reactingproteins. A full molecular docking prediction is not needed as we are not interestedin the binding capacity of an antibody, but merely the similarity of two proteinsurfaces.

    The first problem, accurately predicting B-cell epitopes, is a major challenge es-pecially in vaccine development. However, even though recent publications pro-pose improved epitope prediction methods (Scarabelli et al., 2010; Fiorucci andZacharias, 2010), the field has apparently not yet achieved a high level of reliabil-ity allowing to forego laboratory experiments (Bryson et al., 2010). Whether thereliability would be high enough for cross-reactivity prediction would have to bedetermined. Anyway, a similarity search on entire protein surfaces as opposed toonly searching epitopes might eliminate the need to identify epitopes in the firstplace.

    Thus a last problem persists: comparing the surfaces of two proteins and identifysimilar patches. Several approaches to this problem have been proposed, somepurely geometrical (e.g. spin-image representations (Bock et al., 2007), geometricinvariant fingerprints (Yin et al., 2009)), others respecting electrostatic properties(e.g. the adaptive Poisson-Boltzmann solver (Baker et al., 2001)). It would cer-tainly be interesting to apply these techniques to 3-D allergen structures in orderto identify potentially cross-reacting surface patches.

    3. Scientific Overview 25

  • References

    Klaus J Erb. Helminths, allergic disorders and ige-mediated immune responses:where do we stand? Eur J Immunol, 37(5):1170–3, May 2007. doi: 10.1002/eji.200737314.

    Hannah J Gould, Brian J Sutton, Andrew J Beavil, Rebecca L Beavil, Na-talie McCloskey, Heather A Coker, David Fear, and Lyn Smurthwaite.The biology of ige and the basis of allergic disease. Annu Rev Im-munol, 21:579–628, Jan 2003. doi: 10.1146/annurev.immunol.21.120601.141103.URL http://www.annualreviews.org/doi/abs/10.1146/annurev.immunol.21.120601.141103.

    A B Kay. Overview of ’allergy and allergic diseases: with a view to the future’. BrMed Bull, 56(4):843–64, 2000. URL http://www.ncbi.nlm.nih.gov/pubmed/11359624.

    Birgit Helm, Philip Marsh, Donata Vercelli, Eduardo Padlan, Hannah Gould, andRaif Geha. The mast cell binding site on human immunoglobulin e. Nature, 331(6152):180, Jan 1988. doi: doi:10.1038/331180a0. URL http://www.nature.com/nature/journal/v331/n6152/abs/331180a0.html.

    H Metzger. The high affinity receptor for ige on mast cells. Clin Exp Allergy, 21(3):269–79, May 1991.

    M J Nadler, S A Matthews, H Turner, and J P Kinet. Signal transduction by thehigh-affinity immunoglobulin e receptor fc epsilon ri: coupling form to function.Adv Immunol, 76:325–55, Jan 2000.

    Sanford B Hooker and William C Boyd. The existence of antigenic determinantsof diverse specificity in a single protein — the journal of immunology. Journalof Immunology, 26:469–79, 1934. URL http://www.jimmunol.org/content/26/6/469.abstract.

    P H Maurer. I. antigenicity of oxypolygelatin and gelatin in man. J Exp Med,100(5):497–513, Nov 1954. URL http://www.ncbi.nlm.nih.gov/pubmed/13211910.

    P G H Gell and B Benacerraf. Studies on hypersensitivity. ii. delayed hypersensi-tivity to denatured proteins in guinea pigs. Immunology, 2(1):64–70, Jan 1959.URL http://www.ncbi.nlm.nih.gov/pubmed/13640681.

    B Benacerraf and P G H Gell. Studies on hypersensitivity. i. delayed and arthus-type skin reactivity to protein conjugates in guinea pigs. Immunology, 2(1):53–63, Jan 1959. URL http://www.ncbi.nlm.nih.gov/pubmed/13640680.

    F Ferreira, T Hawranek, P Gruber, N Wopfner, and Adriano Mari. Allergic cross-reactivity: from gene to the clinic. Allergy, 59(3):243–67, Mar 2004. doi: 10.1046/j.1398-9995.2003.00407.x.

    3. Scientific Overview 26

    http://www.annualreviews.org/doi/abs/10.1146/annurev.immunol.21.120601.141103http://www.annualreviews.org/doi/abs/10.1146/annurev.immunol.21.120601.141103http://www.ncbi.nlm.nih.gov/pubmed/11359624http://www.ncbi.nlm.nih.gov/pubmed/11359624http://www.nature.com/nature/journal/v331/n6152/abs/331180a0.htmlhttp://www.nature.com/nature/journal/v331/n6152/abs/331180a0.htmlhttp://www.jimmunol.org/content/26/6/469.abstracthttp://www.jimmunol.org/content/26/6/469.abstracthttp://www.ncbi.nlm.nih.gov/pubmed/13211910http://www.ncbi.nlm.nih.gov/pubmed/13211910http://www.ncbi.nlm.nih.gov/pubmed/13640681http://www.ncbi.nlm.nih.gov/pubmed/13640680

  • K W Wucherpfennig and J L Strominger. Molecular mimicry in t cell-mediatedautoimmunity: viral peptides activate human t cell clones specific for myelinbasic protein. Cell, 80(5):695–705, Mar 1995.

    D R Davies, S Sheriff, and E A Padlan. Antibody-antigen complexes. J BiolChem, 263(22):10541–4, Aug 1988. URL http://www.jbc.org/content/263/22/10541.long.

    Virginie Lafont, Michael Schaefer, Roland H Stote, Danièle Altschuh, and AnnickDejaegere. Protein-protein recognition and interaction hot spots in an antigen-antibody complex: free energy decomposition identifies ”efficient amino acids”.Proteins, 67(2):418–34, May 2007. doi: 10.1002/prot.21259. URL http://www.ncbi.nlm.nih.gov/pubmed/17256770.

    J W Kappler, N Roehm, and P Marrack. T cell tolerance by clonal elimination inthe thymus. Cell, 49(2):273–80, Apr 1987.

    P Kisielow, H S Teh, H Blüthmann, and H von Boehmer. Positive selection ofantigen-specific t cells in thymus by restricting mhc molecules. Nature, 335(6192):730–3, Oct 1988. doi: 10.1038/335730a0. URL http://www.nature.com/nature/journal/v335/n6192/abs/335730a0.html.

    J Holm, G Baerentzen, M Gajhede, H Ipsen, J N Larsen, H Løwenstein, M Wis-senbach, and M D Spangfort. Molecular basis of allergic cross-reactivity be-tween group 1 major allergens from birch and apple. J Chromatogr B BiomedSci Appl, 756(1-2):307–13, May 2001. URL http://www.ncbi.nlm.nih.gov/pubmed/11419722.

    S Laffer, R Valenta, S Vrtala, M Susani, R van Ree, D Kraft, O Scheiner, andM Duchêne. Complementary dna cloning of the major allergen phl p i fromtimothy grass (phleum pratense); recombinant phl p i inhibits ige binding togroup i allergens from eight different grass species. J Allergy Clin Immunol, 94(4):689–98, Oct 1994. URL http://www.ncbi.nlm.nih.gov/pubmed/7930302.

    S Laffer, M Duchene, I Reimitzer, M Susani, C Mannhalter, D Kraft, and R Va-lenta. Common ige-epitopes of recombinant phl p i, the major timothy grasspollen allergen and natural group i grass pollen isoallergens. Mol Immunol, 33(4-5):417–26, Jan 1996. URL http://www.ncbi.nlm.nih.gov/pubmed/8676893.

    R Valenta, M Duchene, C Ebner, P Valent, C Sillaber, P Deviller, F Ferreira,M Tejkl, H Edelmann, and D Kraft. Profilins constitute a novel family offunctional plant pan-allergens. J Exp Med, 175(2):377–85, Feb 1992.

    H Breiteneder and C Ebner. Molecular and biochemical classification of plant-derived food allergens. J Allergy Clin Immunol, 106(1 Pt 1):27–36, Jul 2000.doi: 10.1067/mai.2000.106929.

    T Midoro-Horiuti, E G Brooks, and R M Goldblum. Pathogenesis-related proteinsof plants as allergens. Ann. Allergy Asthma Immunol., 87(4):261–71, Oct 2001.doi: 10.1016/S1081-1206(10)62238-7.

    3. Scientific Overview 27

    http://www.jbc.org/content/263/22/10541.longhttp://www.jbc.org/content/263/22/10541.longhttp://www.ncbi.nlm.nih.gov/pubmed/17256770http://www.ncbi.nlm.nih.gov/pubmed/17256770http://www.nature.com/nature/journal/v335/n6192/abs/335730a0.htmlhttp://www.nature.com/nature/journal/v335/n6192/abs/335730a0.htmlhttp://www.ncbi.nlm.nih.gov/pubmed/11419722http://www.ncbi.nlm.nih.gov/pubmed/11419722http://www.ncbi.nlm.nih.gov/pubmed/7930302http://www.ncbi.nlm.nih.gov/pubmed/8676893

  • S L Surplus, B R Jordan, A M Murphy, J P Carr, B Thomas, and S A H Mack-erness. Ultraviolet-b-induced responses in arabidopsis thaliana: role of salicylicacid and reactive oxygen species in the regulation of transcripts encoding pho-tosynthetic and acidic pathogenesis-related proteins - surplus - 2002 - plant, cell& environment - wiley online library. Plant, Cell and Environment, 21:685–94,1998. URL http://onlinelibrary.wiley.com/doi/10.1046/j.1365-3040.1998.00325.x/pdf.

    E A Padlan. Anatomy of the antibody molecule. Molecular Immunology, 31(3):169–217, 1994. URL http://www.ncbi.nlm.nih.gov/pubmed/8114766.

    C F Barbas, A Heine, G Zhong, T Hoffmann, S Gramatikova, R Björnestedt,B List, J Anderson, E A Stura, I A Wilson, and R A Lerner. Immune versusnatural selection: antibody aldolases with enzymic rates but broader scope.Science, 278(5346):2085–92, Dec 1997. URL http://www.sciencemag.org/content/278/5346/2085.long.

    Leo C James and Dan S Tawfik. The specificity of cross-reactivity: promis-cuous antibody binding involves specific hydrogen bonds rather than nonspe-cific hydrophobic stickiness. Protein Science : A Publication of the Pro-tein Society, 12(10):2183–93, Oct 2003. doi: 10.1110/ps.03172703. URLhttp://www.ncbi.nlm.nih.gov/pubmed/14500876.

    K Fötisch, F Altmann, D Haustein, and S Vieths. Involvement of carbohy-drate epitopes in the ige response of celery-allergic patients. Int Arch Al-lergy Immunol, 120(1):30–42, Sep 1999. URL http://content.karger.com/produktedb/produkte.asp?typ=fulltext&file=iaa20030.

    Rob C Aalberse, V Koshte, and J G J Clemens. Immunoglobulin e antibod-ies that crossreact with vegetable foods, pollen, and hymenoptera venom.Journal of Allergy and Clinical Immunology, 68(5):356–364, 1981. doi: doi:10.1016/0091-6749(81)90133-0. URL http://www.ncbi.nlm.nih.gov/pubmed/7298999.

    A Mari, P Iacovacci, C Afferni, B Barletta, R Tinghino, G Di Felice, andC Pini. Specific ige to cross-reactive carbohydrate determinants strongly af-fect the in vitro diagnosis of allergic diseases. J Allergy Clin Immunol, 103(6):1005–11, Jun 1999. URL http://linkinghub.elsevier.com/retrieve/pii/S0091674999003486.

    Renato Erzen, Peter Korosec, Mira Silar, Ema Music, and Mitja Kosnik. Car-bohydrate epitopes as a cause of cross-reactivity in patients allergic to hy-menoptera venom. Wiener klinische Wochenschrift, 121(9-10):349–52, Jan 2009.doi: 10.1007/s00508-009-1171-1.

    Stefan Vieths, Stephan Scheurer, and Barbara Ballmer-Weber. Current under-standing of cross-reactivity of food allergens and pollen. Ann N Y Acad Sci,964:47–68, May 2002.

    3. Scientific Overview 28

    http://onlinelibrary.wiley.com/doi/10.1046/j.1365-3040.1998.00325.x/pdfhttp://onlinelibrary.wiley.com/doi/10.1046/j.1365-3040.1998.00325.x/pdfhttp://www.ncbi.nlm.nih.gov/pubmed/8114766http://www.sciencemag.org/content/278/5346/2085.longhttp://www.sciencemag.org/content/278/5346/2085.longhttp://www.ncbi.nlm.nih.gov/pubmed/14500876http://content.karger.com/produktedb/produkte.asp?typ=fulltext&file=iaa20030http://content.karger.com/produktedb/produkte.asp?typ=fulltext&file=iaa20030http://www.ncbi.nlm.nih.gov/pubmed/7298999http://www.ncbi.nlm.nih.gov/pubmed/7298999http://linkinghub.elsevier.com/retrieve/pii/S0091674999003486http://linkinghub.elsevier.com/retrieve/pii/S0091674999003486

  • Rob C Aalberse, J Akkerdaas, and R van Ree. Cross-reactivity of ige antibodiesto allergens. Allergy, 56(6):478–90, Jun 2001.

    J Lescar, M Pellegrini, H Souchon, D Tello, R J Poljak, N Peterson, M Greene,and P M Alzari. Crystal structure of a cross-reaction complex between fabf9.13.7 and guinea fowl lysozyme. J Biol Chem, 270(30):18067–76, Jul 1995.URL http://www.jbc.org/content/270/30/18067.long.

    Rob C Aalberse. Assessment of sequence homology and cross-reactivity. ToxicolAppl Pharmacol, 207(2 Suppl):149–51, Sep 2005. doi: 10.1016/j.taap.2005.01.021.

    C H Blackley. Experimental researches on the causes and nature of cattarrhusaestivus. Balliere, Trindall, & Cox, 1873.

    H J Chong Neto and N A Rosário. Studying specific ige: in vivo or in vitro.Allergologia et immunopathologia, 37(1):31–5, Jan 2009. URL http://www.elsevier.es/revistas/ctl_servlet?_f=7014&articuloid=13133446.

    Scott H Sicherer and Hugh A Sampson. Food allergy. J. Allergy Clin. Immunol.,125(2 Suppl 2):S116–25, Feb 2010. doi: 10.1016/j.jaci.2009.08.028. URL http://www.ncbi.nlm.nih.gov/pubmed/20042231.

    John H Krouse and Richard L Mabry. Skin testing for inhalant allergy 2003:current strategies. Otolaryngol Head Neck Surg, 129(4 Suppl):S33–49, Oct 2003.URL http://www.ncbi.nlm.nih.gov/pubmed/14574280.

    S Tripodi, A Di Rienzo Businco, C Alessandri, V Panetta, P Restani, and P M Ma-tricardi. Predicting the outcome of oral food challenges with hen’s egg throughskin test end-point titration. Clin Exp Allergy, 39(8):1225–33, Aug 2009. doi:10.1111/j.1365-2222.2009.03250.x. URL http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2222.2009.03250.x/abstract.

    E A Pastorello, C Incorvaia, C Ortolani, S Bonini, G W Canonica, S Romag-nani, A Tursi, and C Zanussi. Studies on the relationship between the levelof specific ige antibodies and the clinical expression of allergy: I. definitionof levels distinguishing patients with symptomatic from patients with asymp-tomatic allergy to common aeroallergens. J Allergy Clin Immunol, 96(5 Pt1):580–7, Nov 1995. URL http://linkinghub.elsevier.com/retrieve/pii/S0091-6749(95)70255-5.

    Martin D Chapman, A M Smith, L D Vailes, L K Arruda, V Dhanaraj, andA Pomés. Recombinant allergens for diagnosis and therapy of allergic dis-ease. J Allergy Clin Immunol, 106(3):409–18, Sep 2000. doi: 10.1067/mai.2000.109832. URL http://linkinghub.elsevier.com/retrieve/pii/S0091674900564069.

    Robert G Hamilton. Clinical laboratory assessment of immediate-type hyper-sensitivity. J. Allergy Clin. Immunol., 125(2 Suppl 2):S284–96, Feb 2010.

    3. Scientific Overview 29

    http://www.jbc.org/content/270/30/18067.longhttp://www.elsevier.es/revistas/ctl_servlet?_f=7014&articuloid=13133446http://www.elsevier.es/revistas/ctl_servlet?_f=7014&articuloid=13133446http://www.ncbi.nlm.nih.gov/pubmed/20042231http://www.ncbi.nlm.nih.gov/pubmed/20042231http://www.ncbi.nlm.nih.gov/pubmed/14574280http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2222.2009.03250.x/abstracthttp://onlinelibrary.wiley.com/doi/10.1111/j.1365-2222.2009.03250.x/abstracthttp://linkinghub.elsevier.com/retrieve/pii/S0091-6749(95)70255-5http://linkinghub.elsevier.com/retrieve/pii/S0091-6749(95)70255-5http://linkinghub.elsevier.com/retrieve/pii/S0091674900564069http://linkinghub.elsevier.com/retrieve/pii/S0091674900564069

  • doi: 10.1016/j.jaci.2009.09.055. URL http://www.ncbi.nlm.nih.gov/pubmed/20176264.

    R Valenta, J Lidholm, V Niederberger, B Hayek, D Kraft, and H Grönlund. Therecombinant allergen-based concept of component-resolved diagnostics and im-munotherapy (crd and crit). Clin Exp Allergy, 29(7):896–904, Jul 1999.

    M van Hage-Hamsten and G Pauli. Provocation testing with recombinant aller-gens. Methods, 32(3):281–91, Mar 2004. doi: 10.1016/j.ymeth.2003.08.007. URLhttp://www.ncbi.nlm.nih.gov/pubmed/14962763.

    Reinhard Hiller, Sylvia Laffer, Christian Harwanegg, Martin Huber, Wolfgang MSchmidt, Anna Twardosz, Bianca Barletta, Wolf M Becker, Kurt Blaser, HeimoBreiteneder, Martin Chapman, Reto Crameri, Michael Duchêne, Fatima Fer-reira, Helmut Fiebig, Karin Hoffmann-Sommergruber, Te Piao King, TamaraKleber-Janke, Viswanath P Kurup, Samuel B Lehrer, Jonas Lidholm, UlrichMüller, Carlo Pini, Gerald Reese, Otto Scheiner, Annika Scheynius, Horng-DerShen, Susanne Spitzauer, Roland Suck, Ines Swoboda, Wayne Thomas, Raf-faela Tinghino, Marianne Van Hage-Hamsten, Tuomas Virtanen, Dietrich Kraft,Manfred W Müller, and Rudolf Valenta. Microarrayed allergen molecules: diag-nostic gatekeepers for allergy treatment. FASEB J., 16(3):414–6, Mar 2002. doi:10.1096/fj.01-0711fje. URL http://www.fasebj.org/content/early/2002/03/02/fj.01-0711fje.long.

    FAO and WHO. Evaluation of allergenicity of genetically modified foods. reportof a joint fao/who expert consultation on allergenicity of foods derived frombiotechnology. Jan 2001.

    J A Nordlee, S L Taylor, J A Townsend, L A Thomas, and R K Bush. Identificationof a brazil-nut allergen in transgenic soybeans. N. Engl. J. Med., 334(11):688–92, Mar 1996. doi: 10.1056/NEJM199603143341103. URL http://www.nejm.org/doi/full/10.1056/NEJM199603143341103.

    Michael B Stadler and Beda M Stadler. Allergenicity prediction by protein se-quence. FASEB J., 17(9):1141–3, Apr 2003. doi: 10.1096/fj.02-1052fje. URLhttp://www.fasebj.org/cgi/content/abstract/02-1052fjev1.

    Kuo-Bin Li, Praveen Issac, and Arun Krishnan. Predicting allergenic proteinsusing wavelet transform. Bioinformatics, 20(16):2572–8, Nov 2004. doi: 10.1093/bioinformatics/bth286. URL http://bioinformatics.oxfordjournals.org/cgi/reprint/20/16/2572.

    Tariq Riaz, Hen Ley Hor, Arun Krishnan, Francis Tang, and Kuo-Bin Li. We-ballergen: a web server for predicting allergenic proteins. Bioinformatics,21(10):2570–1, May 2005. doi: 10.1093/bioinformatics/bti356. URL http://bioinformatics.oxfordjournals.org/cgi/content/full/21/10/2570.

    Karluss Thomas, Gary Bannon, Susan Hefle, Corinne Herouet, Michael Holsap-ple, Gregory Ladics, Sue Macintosh, and Laura Privalle. In silico methods for

    3. Scientific Overview 30

    http://www.ncbi.nlm.nih.gov/pubmed/20176264http://www.ncbi.nlm.nih.gov/pubmed/20176264http://www.ncbi.nlm.nih.gov/pubmed/14962763http://www.fasebj.org/content/early/2002/03/02/fj.01-0711fje.longhttp://www.fasebj.org/content/early/2002/03/02/fj.01-0711fje.longhttp://www.nejm.org/doi/full/10.1056/NEJM199603143341103http://www.nejm.org/doi/full/10.1056/NEJM199603143341103http://www.fasebj.org/cgi/content/abstract/02-1052fjev1http://bioinformatics.oxfordjournals.org/cgi/reprint/20/16/2572http://bioinformatics.oxfordjournals.org/cgi/reprint/20/16/2572http://bioinformatics.oxfordjournals.org/cgi/content/full/21/10/2570http://bioinformatics.oxfordjournals.org/cgi/content/full/21/10/2570

  • evaluating human allergenicity to novel proteins: International bioinformaticsworkshop meeting report, 23-24 february 2005. Toxicological Sciences, 88(2):307–10, Dec 2005. doi: 10.1093/toxsci/kfi277.

    Sudipto Saha and G P S Raghava. Algpred: prediction of allergenic proteins andmapping of ige epitopes. Nucleic Acids Res, 34(Web Server issue):W202–9, Jul2006. doi: 10.1093/nar/gkl343. URL http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W202.

    ZH Zhang, JL Koh, GL Zhang, KH Choo, MT Tammi, and JC Tong. Allertool:a web server for predicting allergenicity and allergic cross-reactivity in proteins.Bioinformatics, Dec 2006. doi: 10.1093/bioinformatics/btl621. URL http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl621v1.

    Waiming Kong, Tsu Soo Tan, Lawrence Tham, and Keng Wah Choo. Improvedprediction of allergenicity by combination of multiple sequence motifs. In Sil-ico Biol (Gedrukt), 7(1):77–86, Jan 2007. URL http://www.bioinfo.de/isb/2006070006/.

    Juan Cui, Lian Yi Han, Hu Li, Choong Yong Ung, Zhi Qun Tang, Chan JuanZheng, Zhi Wei Cao, and Yu Zong Chen. Computer prediction of allergen pro-teins from sequence-derived protein structural and physicochemical properties.Mol Immunol, 44(4):514–20, Jan 2007. doi: 10.1016/j.molimm.2006.02.010.

    Catherine H. Schein, Ovidiu Ivanciuc, and Werner Braun. Bioinformatics ap-proaches to classifying allergens and predicting cross-reactivity. Immunol-ogy and allergy clinics of North America, 27(1):1, Feb 2007. doi: 10.1016/j.iac.2006.11.005. URL http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1941676.

    Alvaro Martinez Barrio, Daniel Soeria-Atmadja, Anders Nistér, Mats G Gustafs-son, Ulf Hammerling, and Erik Bongcam-Rudloff. Evaller: a web server forin silico assessment of potential protein allergenicity. Nucleic Acids Res, 35(Web Server issue):W694–700, Jul 2007. doi: 10.1093/nar/gkm370. URLhttp://nar.oxfordjournals.org/cgi/content/full/35/suppl_2/W694.

    Joo Chuan Tong and Martti T Tammi. Prediction of protein allergenicity usinglocal description of amino acid sequence. Front Biosci, 13:6072–8, Jan 2008.URL http://www.bioscience.org/2008/v13/af/3138/fulltext.htm.

    Shen Jean Lim, Joo Chuan Tong, Fook Tim Chew, and Martti T Tammi. The valueof position-specific scoring matrices for assessment of protein allegenicity. BMCBioinformatics, 9 Suppl 12:S21, Jan 2008. doi: 10.1186/1471-2105-9-S12-S21.

    Hon Cheng Muh, Joo Chuan Tong, and Martti T Tammi. Allerhunter: asvm-pairwise system for assessment of allergenicity and allergic cross-reactivityin proteins. PLoS ONE, 4(6):e5861, Jan 2009. doi: 10.1371/journal.pone.0005861. URL http://www.plosone.org/article/info%253Adoi%252F10.1371%252Fjournal.pone.0005861.

    3. Scientific Overview 31

    http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W202http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W202http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl621v1http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl621v1http://www.bioinfo.de/isb/2006070006/http://www.bioinfo.de/isb/2006070006/http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1941676http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1941676http://nar.oxfordjournals.org/cgi/content/full/35/suppl_2/W694http://www.bioscience.org/2008/v13/af/3138/fulltext.htmhttp://www.plosone.org/article/info%253Adoi%252F10.1371%252Fjournal.pone.0005861http://www.plosone.org/article/info%253Adoi%252F10.1371%252Fjournal.pone.0005861

  • Ovidiu Ivanciuc, Catherine H Schein, Tzintzuni Garcia, Numan Oezguen, Suren-dra S Negi, and Werner Braun. Structural analysis of linear and conformationalepitopes of allergens. Regul Toxicol Pharmacol, 54(3 Suppl):S11–9, Aug 2009.doi: 10.1016/j.yrtph.2008.11.007.

    R Asero. Effects of birch pollen-specific immunotherapy on apple allergy inbirch pollen-hypersensitive patients. Clin Exp Allergy, 28(11):1368–73, Nov1998. URL http://onlinelibrary.wiley.com/doi/10.1046/j.1365-2222.1998.00399.x/abstract.

    S Pascarella and P Argos. A data bank merging related protein structuresand sequences. Protein Eng, 5(2):121–37, Mar 1992. URL http://peds.oxfordjournals.org/content/5/2/121.long.

    C Chothia. Proteins. one thousand families for the molecular biologist. Nature,357(6379):543–4, Jun 1992. doi: 10.1038/357543a0. URL http://www.nature.com/nature/journal/v357/n6379/abs/357543a0.html.

    Patrick Aloy and Robert B Russell. Ten thousand interactions for the molecular bi-ologist. Nature Biotechnology, 22(10):1317, Oct 2004. doi: doi:10.1038/nbt1018.URL http://www.nature.com/nbt/journal/v22/n10/full/nbt1018.html.

    R E Dickerson and I Geis. Hemoglobin: structure, function, evolution and pathol-ogy. Benjamin/Cummings Publishing Co., Inc., 1983.

    M Gribskov, A D McLachlan, and D Eisenberg. Profile analysis: detection ofdistantly related proteins. Proc Natl Acad Sci USA, 84(13):4355–8, Jul 1987.URL http://www.pnas.org/cgi/reprint/84/13/4355.

    W R Pearson and D J Lipman. Improved tools for biological sequence comparison.Proc Natl Acad Sci USA, 85(8):2444–8, Apr 1988. URL http://www.pnas.org/content/85/8/2444.long.

    Alexander Yu Mitrophanov and Mark Borodovsky. Statistical significance in bio-logical sequence analysis. Brief Bioinformatics, 7(1):2–24, Mar 2006.

    M Pagni and C V Jongeneel. Making sense of score statistics for sequencealignments. Brief Bioinformatics, 2(1):51–67, Mar 2001. URL http://bib.oxfordjournals.org/content/2/1/51.long.

    Timothy L Bailey and C Elkan. Fitting a mixture model by expectation maxi-mization to discover motifs in biopolymers. Proceedings / International Con-ference on Intelligent Systems for Molecular Biology ; ISMB International Con-ference on Intelligent Systems for Molecular Biology, 2:28–36, Jan 1994. URLhttp://www.ncbi.nlm.nih.gov/pubmed/7584402?dopt=abstract.

    P Bucher, K Karplus, N Moeri, and K Hofmann. A flexible motif search techniquebased on generalized profiles. Comput Chem, 20(1):3–23, Mar 1996. URL http://www.ncbi.nlm.nih.gov/pubmed/8867839.

    3. Scientific Overview 32

    http://onlinelibrary.wiley.com/doi/10.1046/j.1365-2222.1998.00399.x/abstracthttp://onlinelibrary.wiley.com/doi/10.1046/j.1365-2222.1998.00399.x/abstracthttp://peds.oxfordjournals.org/content/5/2/121.longhttp://peds.oxfordjournals.org/content/5/2/121.longhttp://www.nature.com/nature/journal/v357/n6379/abs/357543a0.htmlhttp://www.nature.com/nature/journal/v357/n6379/abs/357543a0.htmlhttp://www.nature.com/nbt/journal/v22/n10/full/nbt1018.htmlhttp://www.pnas.org/cgi/reprint/84/13/4355http://www.pnas.org/content/85/8/2444.longhttp://www.pnas.org/content/85/8/2444.longhttp://bib.oxfordjournals.org/content/2/1/51.longhttp://bib.oxfordjournals.org/content/2/1/51.longhttp://www.ncbi.nlm.nih.gov/pubmed/7584402?dopt=abstracthttp://www.ncbi.nlm.nih.gov/pubmed/8867839http://www.ncbi.nlm.nih.gov/pubmed/8867839

  • W Li, L Jaroszewski, and A Godzik. Clustering of highly homologous sequencesto reduce the size of large protein databases. Bioinformatics (Oxford, England),17(3):282–3, Mar 2001.

    L Holm and C Sander. Mapping the protein universe. Science, 273(5275):595–603,Aug 1996. URL http://www.sciencemag.org/content/273/5275/595.long.

    Yang Zhang. Progress and challenges in protein structure prediction. Curr OpinStruct Biol, 18(3):342–8, Jun 2008. doi: 10.1016/j.sbi.2008.02.004.

    Bojan Zagrovic, Christopher D Snow, Michael R Shirts, and Vijay S Pande.Simulation of folding of a small alpha-helical protein in atomistic detail usingworldwide-distributed computing. J Mol Biol, 323(5):927–37, Nov 2002. URLhttp://www.ncbi.nlm.nih.gov/pubmed/12417204.

    Michael Shirts and Vijay S Pande. Screen savers of the world unite! Science, 290(5498):1903–4, 2000. doi: 10.1126/science.290.5498.1903. URL http://www.ncbi.nlm.nih.gov/pubmed/17742054.

    Vijay S Pande. A simple theory of protein folding kinetics. 2010. URL http://arxiv.org/abs/1007.0315.

    Jeffrey Skolnick, Daisuke Kihara, and Yang Zhang. Development andlarge scale benchmark testing of the prospector 3 threading algo-rithm. Proteins, 56(3):502–18, Aug 2004. doi: 10.1002/prot.20106. URL http://onlinelibrary.wiley.com/doi/10.1002/prot.20106/abstract;jsessionid=7D0BBD01853416611E9A418F01E41F82.d02t02.

    Lukasz Jaroszewski, Leszek Rychlewski, Zhanwen Li, Weizhong Li, and AdamGodzik. Ffas03: a server for profile–profile sequence alignments. Nucleic AcidsRes, 33(Web Server issue):W284–8, Jul 2005. doi: 10.1093/nar/gki418. URLhttp://nar.oxfordjournals.org/content/33/suppl_2/W284.long.

    Yang Zhang and Jeffrey Skolnick. The protein structure prediction problem couldbe solved using the current pdb library. Proc Natl Acad Sci USA, 102(4):1029–34, Jan 2005. doi: 10.1073/pnas.0407152101. URL http://www.pnas.org/content/102/4/1029.long.

    B C Braden and R J Poljak. Structural features of the reactions between antibodiesand protein antigens. FASEB J., 9(1):9–16, Jan 1995.

    Guido Scarabelli, Giulia Morra, and Giorgio Colombo. Predicting interaction sitesfrom the energetics of isolated proteins: a new approach to epitope mapping.Biophys J, 98(9):1966–75, May 2010. doi: 10.1016/j.bpj.2010.01.014. URLhttp://www.ncbi.nlm.nih.gov/pubmed/20441761.

    Sébastien Fiorucci and Martin Zacharias. Prediction of protein-protein interactionsites using electrostatic desolvation profiles. Biophys J, 98(9):1921–30, May2010. doi: 10.1016/j.bpj.2009.12.4332. URL http://www.ncbi.nlm.nih.gov/pubmed/20441756.

    3. Scientific Overview 33

    http://www.sciencemag.org/content/273/5275/595.longhttp://www.ncbi.nlm.nih.gov/pubmed/12417204http://www.ncbi.nlm.nih.gov/pubmed/17742054http://www.ncbi.nlm.nih.gov/pubmed/17742054http://arxiv.org/abs/1007.0315http://arxiv.org/abs/1007.0315http://onlinelibrary.wiley.com/doi/10.1002/prot.20106/abstract;jsessionid=7D0BBD01853416611E9A418F01E41F82.d02t02http://onlinelibrary.wiley.com/doi/10.1002/prot.20106/abstract;jsessionid=7D0BBD01853416611E9A418F01E41F82.d02t02http://nar.oxfordjournals.org/content/33/suppl_2/W284.longhttp://www.pnas.org/content/102/4/1029.longhttp://www.pnas.org/content/102/4/1029.longhttp://www.ncbi.nlm.nih.gov/pubmed/20441761http://www.ncbi.nlm.nih.gov/pubmed/20441756http://www.ncbi.nlm.nih.gov/pubmed/20441756

  • Christine J Bryson, Tim D Jones, and Matthew P Baker. Prediction ofimmunogenicity of therapeutic proteins: validity of computational tools.BioDrugs, 24(1):1–8, Feb 2010. doi: 10.2165/11318560-000000000-00000.URL http://adisonline.com/biodrugs/pages/articleviewer.aspx?year=2010&issue=24010&article=00001&type=abstract.

    Mary Ellen Bock, Claudio Garutti, and Concettina Guerra. Discovery of similarregions on protein surfaces. J Comput Biol, 14(3):285–99, Apr 2007. doi: 10.1089/cmb.2006.0145.

    S Yin, E. A Proctor, A. A Lugovskoy, and N. V Dokholyan. Fast screening ofprotein surfaces using geometric invariant fingerprints. Proceedings of the Na-tional Academy of Sciences, 106(39):16622–16626, Sep 2009. doi: 10.1073/pnas.0906146106. URL http://www.pnas.org/content/106/39/16622.full.

    N A Baker, D Sept, S Joseph, M J Holst, and J A McCammon. Electrostatics ofnanosystems: application to microtubules and the ribosome. Proc Natl Acad SciUSA, 98(18):10037–41, Aug 2001. doi: 10.1073/pnas.181342398. URL http://www.pnas.org/content/98/18/10037.long.

    3. Scientific Overview 34

    http://adisonline.com/biodrugs/pages/articleviewer.aspx?year=2010&issue=24010&article=00001&type=abstracthttp://adisonline.com/biodrugs/pages/articleviewer.aspx?year=2010&issue=24010&article=00001&type=abstracthttp://www.pnas.org/content/106/39/16622.fullhttp://www.pnas.org/content/98/18/10037.longhttp://www.pnas.org/content/98/18/10037.long

  • 4Results – DissertationEquivalents

    Dissertation Equivalent I

    Pfiffner P, Truffer R, Matsson P, Rasi C, Mari A, Stadler BM. Allergen crossreactions: a problem greater than ever thought? Allergy 2010; 65: 1536–1544.

    Dissertation Equivalent II

    Pfiffner P, Stadler BM, Rasi C, Scala E, Mari A. Allergen clustering evaluated byin silico motifs or in vitro IgE microarray testing using highly purified allergensmanuscript in preparation.

    35

  • ORIGINAL ARTICLE EXPERIMENTAL ALLERGY AND IMMUNOLOGY

    Allergen cross reactions: a problem greater than everthought?P. Pfiffner1, R. Truffer1, P. Matsson2, C. Rasi3, A. Mari3,4 & B. M. Stadler1

    1University Institute of Immunology, Bern, Switzerland; 2Phadia AB, Uppsala, Sweden; 3Center for Clinical and Experimental Allergology,

    IDI-IRCCS, Rome; 4Allergy Data Laboratories s.c., Latina, Italy

    To cite this article: Pfiffner P, Truffer R, Matsson P, Rasi C, Mari A, Stadler BM. Allergen cross reactions: a problem greater than ever thought? Allergy 2010;

    65: 1536–1544.

    Determination of specific IgE in patient sera is a valuable

    test for allergologists (1). The number of potential allergens is

    steadily increasing (2) and suppliers of allergy tests are provid-

    ing ever longer lists of allergenic preparations to be used for in

    vitro assays. In most instances, allergens are still relatively

    crude extracts of organisms or parts thereof (3). Recently,

    allergen diagnosis has improved by the use of highly purified

    natural or recombinant allergens and protein microarrays (3–

    8). This may improve allergy diagnostics in the future.

    Cross reactions are allergic reactions against other aller-

    gens without prior sensitization. They have been extensively

    studied and a handful of well-defined cross reactivity syn-

    dromes are clinically highly important, e.g., the pollen-food

    syndromes (9). Cross reactions between recombinant aller-

    gens are also documented (3, 7, 10). Thus, the immune sys-

    tem might recognize common structures, allowing to predict

    allergic reactions that have not been tested physically but

    were derived by similarity.

    We have previously shown that a bioinformatic approach is

    capable to define a much lower number of potentially aller-

    genic structures, termed motifs, than the number of known

    protein sequences of allergens (11). These motifs represent a

    scaled profile over a window of 50 amino acids, derived from

    all currently known allergen protein sequences. They serve as

    an identifier for evolutionary conserved protein domains. Con-

    sequently, if protein sequences match a given motif, these pro-

    teins are predicted to fold into the same protein domain and

    therefore exhibit similar surface structures. We showed that

    Keywords

    allergens; bioinformatics; sensitization;

    sequence motifs; specific IgE.

    Correspondence

    Pascal Pfiffner, University Institute of

    Immunology, Sahli Haus 2, Inselspital, 3010

    Bern, Switzerland.

    Tel.: +41 31 632 22 89

    Fax: +41 31 632 35 00

    E-mail: [email protected]

    Accepted for publication 4 May 2010

    DOI:10.1111/j.1398-9995.2010.02420.x

    Edited by: Reto Crameri

    Abstract

    Background: Cross reactions are an often observed phenomenon in patients with

    allergy. Sensitization against some allergens may cause reactions against other seem-

    ingly unrelated allergens. Today, cross reactions are being investigated on a per-case

    basis, analyzing blood serum specific IgE (sIgE) levels and clinical features of

    patients suffering from cross reactions. In this study, we evaluated the level of sIgE

    compared to patients’ total IgE assuming epitope specificity is a consequence of

    sequence similarity.

    Methods: Our objective was to evaluate our recently published model of molecular

    sequence similarities underlying cross reactivity using serum-derived data from IgE

    determinations of standard laboratory tests.

    We calculated the probabilities of protein cross reactivity based on conserved

    sequence motifs and compared these in silico predictions to a database consisting of

    5362 sera with sIgE determinations.

    Results: Cumulating sIgE values of a patient resulted in a median of 25–30% total

    IgE. Comparing motif cross reactivity predictions to sIgE levels showed that on

    average three times fewer motifs than extracts were recognized in a given serum

    (correlation coefficient: 0.967). Extracts belonging to the same motif group

    co-reacted in a high percentage of sera (up to 80% for some motifs).

    Conclusions: Cumulated sIgE levels are exaggerated because of a high level of

    observed cross reactions. Thus, not only bioinformatic prediction of allergenic

    motifs, but also serological routine testing of allergic patients implies that the

    immune system may recognize only a small number of allergenic structures.

    Allergy

    1536 Allergy 65 (2010) 1536–1544 ª 2010 John Wiley & Sons A/S

    4. Results – Dissertation Equivalents 36

  • this method of cross reactivity prediction is superior to the

    FAO/WHO rule, which states that a protein is allergenic if it

    has either an identity of at least six continuous amino acids or

    more than 35% sequence similarity over a window of 80 amino

    acid residues. Especially in view of false positive matches

    (67.3% of all Swiss-Prot proteins were predicted to be

    allergenic by the FAO/WHO rule), the motif-based approach

    performed much better (2.6% predicted to be allergenic) (11).

    Thus, the question remains whether the in silico prediction

    of allergenicity may be confirmed by wet lab data. For this pur-

    pose, we have analyzed 5362 sera corresponding to 203 283

    specific IgE determinations. We could demonstrate that the

    degree of cross reaction was greater than ever thought.

    Materials and methods

    Serum samples

    Data on 5456 serum samples were obtained by testing for

    IgE using Phadia’s ImmunoCAP� (former UniCAP�, Phadia

    AB, Uppsala, Sweden) systems. These are sandwich immuno-

    assay systems where serum IgE antibodies react with anti-IgE

    covalently coupled to the system in case of total IgE deter-

    mination or with solid-phase bound allergen extracts to

    determine specific IgE. Bound antibodies are detected and

    quantified using enzyme-labeled anti-IgE-antibodies and fluo-

    rescence detection.

    Tests were performed in the years from 1988 to 2006 in 17

    different countries in different laboratories. Raw, anonymized

    IgE data (no age, sex, and other demographic and clinical

    information) were collected as quality assurance; therefore,

    no selection criteria were applied. Test results were collected

    in a clinical setting; most sera are presumably from patients

    with atopy.

    All IgE levels are expressed in kilo units of antigen per

    liter serum (kUA/l). Specific IgE levels >0.35 kUA/l (Class I

    and higher) were regarded as a positive test result, levels

    >100 kUA/l were capped at 100 kUA/l, which affected 1578

    values.

    Included in the database were serum levels for 99 allergens

    as well as the total IgE level. According to the manufacturer,

    the 99 allergen extracts used to determine the specific IgE val-

    ues are the 99 most tested allergens among a list of more

    than 700 allergens available in Phadia’s catalog. Table 1 lists

    the extracts and groups them into major subsets.

    Sera had to be tested for total IgE, against at least 10 dif-

    ferent allergens and yield at least one positive specific IgE test

    result to be allowed for the final database. With a total of

    203 283 specific IgE tests, 5362 sera met our criteria and were

    used for the analysis.

    Databases and software

    We created a MySQL database to hold the serum data (MyS-

    QL 5.0, obtained from http://www.mysql.com/). Allergen

    protein sequences were extracted from the Allergome data-

    base (http://www.allergome.org/ as of January 2009). MEME

    3.5.7 (12) (obtained from http://meme.sdsc.edu/meme/) and

    pftools 2.3.4 (13) (obtained from http://www.isrec.isb-sib.ch/

    ftp-server/pftools/) were used for the iterative allergen motif

    discovery. Perl 5.8.8 (http://www.perl.org/), PHP 5.2+

    (http://www.php.net/), and R 2.8 (14) (http://www.r-project.

    org/) scripts were created to extract the desired statistical

    calculations.

    Allergen motifs

    We performed the iterative allergen motif discovery according

    to Stadler and Stadler. (11) using 2189 protein sequences

    from