MOONLIGHTING PROTEINS DATABASE (MoonProt): A database … · 2019. 11. 23. · Moonlighting Protein...

Moonlighting Protein Database (MoonProt): A Database for Proteins That Are Known to Moonlight

BY

MATHEW K. MANI B.S., University of Illinois at Urbana-Champaign, 2009

THESIS

Submitted as partial fulfillment of the requirements for the degree of Master of Science in Bioinformatics

in the Graduate College of the University of Illinois at Chicago, 2013

Chicago, Illinois

Defense Committee: Dr. Jie Liang, Chair Dr. Constance Jeffery, Advisor Dr. Aixa Alfonso, Biological Sciences

I dedicate my dissertation work to my family. Especially to my loving parents,

Kavattu and Chinnamma Mani for their prayers and words of encouragement and my

brother Joey, who has supported me while being overseas.

Also, I dedicate this work and give special thanks to my best friend, Lisa Kurien

for being there with me throughout the entire program.

Finally, I would like to dedicate this dissertation to my many friends and church

family who have supported me throughout the process. I will always appreciate all they

have done.

ii

ACKNOWLEDGEMENTS

I would like to thank my thesis committee, Dr. Jie Liang, Dr. Constance Jeffery,

and Dr. Aixa Alfonso for their unwavering support and assistance. A special thanks to my

research advisor, Dr. Constance Jeffery, for provided guidance in all areas that helped me

accomplish my research goals.

I would also like to acknowledge the members of the Jeffery laboratory for

gathering the data and filling out the annotations for the Moonlighting Protein

(MoonProt) Database. Lastly, I would like to acknowledge Senin Paulouse and Toby

Kavukattu for their input and assistance in building the website and database.

iii

TABLE OF CONTENTS CHAPTER PAGE I. BACKGROUND ......................................................................................................1

A. What is a “Moonlighting Protein”? ...............................................................1 1. Differential localization. ..................................................................1 2. Differential Expression ....................................................................2 3. Function inside vs. Function outside ...............................................2 4. Concentration of substrates or other ligands ....................................3 5. Binding Sites ....................................................................................4 6. Oligomerization ...............................................................................4

B. How are Moonlighting Proteins Identified? ..................................................5 C. Why Are Moonlighting Proteins Important? ................................................6

II. MOONLIGHTING PROTEIN DATABASE ...........................................................7

A. Background ...................................................................................................7 B. Examples of Online Databases ......................................................................7

1. Protein Data Bank (PDB).................................................................7 2. Entrez, The Life Sciences Search Engine ........................................9 3. European Bioinformatics Institute (EBI) Database .......................10

C. Moonlighting Proteins Database (MoonProt) .............................................11 1. Home Page .....................................................................................12 2. Proteins Page ..................................................................................13 3. People Page ....................................................................................16 4. FAQ’s Page ....................................................................................18 5. Articles Page ..................................................................................19 6. Other References Page ...................................................................20

D. Search Parameters .......................................................................................21

III. TECHNICAL ASPECTS OF THE MOONLIGHTING DATABASE ..................23 A. Database ......................................................................................................23

1. Query.............................................................................................23 B. Server ................................................................................................................24 C. PHP Files ...........................................................................................................24

IV. CONCLUSION............................................................................................................26 APPENDICIES ..................................................................................................................27

Appendix 1 – List of Moonlighting Proteins with Characteristics ........................28 Appendix 2 – Programmable Code for .php Pages ................................................53 Appendix 3 – Programmable Code for the Database functionality .......................75 Appendix 4 – Figures of Screenshots of MoonProt Database ...............................76

REFERENCES ..................................................................................................................85

VITA ..................................................................................................................................90

iv

LIST OF FIGURES

FIGURE PAGE

1. Graphic Depiction of the Apo-IRP1 protein …...………. 4

2. Home Page Screenshot ………….……………………... 12

3. Proteins Page Screenshot ………………………………. 14

4. Detailed Page – General Information Screenshot ……… 15

5. Detailed Page – Structural Information Screenshot ……. 15

6. Detailed Page – Functional Information Screenshot …... 16

7. People Page Screenshot ………………………………... 17

8. FAQ’s Page Screenshot ………………………………... 19

9. Articles Page Screenshot ………………………………. 20

10. Other References Screenshot …………………..……..... 21

v

LIST OF ABBREVIATIONS

AMF Autocrine Motility Factor

DMM Differentiation and Maturation Mediator

RNA Ribonucleic Acid

DNA Deoxyribonucleic Acid

mRNA Messenger RNA

IRE – BP Iron Responsive Element Binding Protein

MBP Maltose Binding Protein

CFTR Cystic Fibrosis Transmembrane Conductance Regulator

ORCC Outwardly Rectifying Chlorine Channel

ENaC Epithelial Sodium Channel

NMR Nuclear Magnetic Resonance

PDB Protein Data Bank

NCBI National Center for Biotechnology Information

NLM National Library of Medicine

NIH National Institutes of Health

EBI European Bioinformatics Institute

EMBL European Molecular Biology Laboratory

SQL Structured Query Language

RDBMS Relational Database Management System

DBMS Database Management System

ACCC Academic and Computing Center

HTML Hypertext Markup Language

FAQ Frequently Asked Questions

vi

SUMMARY

The Moonlighting Proteins Database (MoonProt) contains information about

proteins that are specifically categorized as ‘moonlighting’. A moonlighting protein is a

multifunctional protein in which the two or more different functions are performed by

one polypeptide chain. Moonlighting proteins do not include proteins in which the

multiple functions are due to gene fusions or multiple splice variants. This database will

allow people to access the name of a moonlighting protein, its functions, its sequence, the

species in which both functions are found, and a picture of the protein structure (if

available). Additional information will be added for some proteins, for example, the

metabolic pathway it belongs to or any unique chemical/physical properties it may have.

The members of the Jeffery laboratory will curate the database.

This database will help users learn more about the proteins that have been found

to moonlight for their own research. It will serve as a benchmark for future projects in

developing methods to identify additional proteins that moonlight. The main purpose of

this database is to serve as a resource for researchers and scientists to help them with their

own experiments.

vii

I. BACKGROUND

A. What is a “Moonlighting Protein”?

Moonlighting proteins are proteins that contain more than one function within one

polypeptide chain. Moonlighting proteins is a relatively new concept developed in part by

Dr. Constance Jeffery, a professor at the University of Illinois at Chicago [1-8].

Some moonlighting proteins can perform both functions simultaneously, but for

others, the functions of a moonlighting protein can vary due to changes occurring within

the cell. A moonlighting protein can adjust to these changes by having different functions

in different cellular locations, in different cell types, when it has different oligomeric

states, or when it senses a change in the cellular concentration of a ligand, substrate,

cofactor or a product of an enzymatic reaction [1,2].

Below are examples of mechanisms that can cause a moonlighting protein to

switch between functions. This is by no means a complete list, since mechanisms of how

proteins moonlight are continually being discovered each day.

1. Differential Localization

A protein can have different functions in two different locations within the

cell. An example of this can be seen in the PutA protein in Escherichia coli (E.coli).

When PutA is within the cytoplasm this protein has enzymatic activity. PutA has both

proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase activity. However,

under some cell conditions, when proline concetration change, PutA binds to DNA, and

acts as a transcriptional repressor [9].

1

2. Differential Expression

Some moonlighting proteins perform different functions when it is

expressed in different cell types. For instance, in nerve axons, neuropilin is a cell surface

receptor that detects a specific ligand called semaphorin III, which helps axons arrive at

their correct destinations within the cell [10]. Neuropilin can also be found in endothelial

cells, in which it is also a surface cell receptor. However, in endothelial cells, it detects a

different ligand, a vascular endothelial growth factor that helps the cell to determine

when new blood cells are needed [10].

3. Function inside vs. Function outside

Along with different locations and expression in different cell types,

proteins can also have one function within the cell and another function outside of it.

Thymidine phosphorylase is a protein that does exactly this. In the cytoplasm of the cell,

it acts as an enzyme and catalyzes the dephosphorylation of thymidine, deoxyuridine, and

2-deoxyribose 1-phosphate. In the extracellular fluid it is a cell growth factor that plays a

role in chemotaxis and causes an increase in cell growth [11].

Another protein that moonlights outside the cell is phosphoglucose isomerase.

Phosphoglucose isomerase is a ubiquitous cytosolic enzyme; a protein that catalyzes the

second step in glycolysis, by converting glucose-6-phosphate to fructose-6-phosphate [12

- 14]. But along with being a vital enzyme in the process of glycolysis, phosphoglucose

isomerase also has multiple other functions. It is a nerve cell growth factor that promotes

the survival of some embryonic spinal neurons as well as sensory nerves [15, 16]. Also,

phosphoglucose isomerase is the same protein as autocrine motility factor (AMF), which

is a cytokine that increases cell movement and migration [17]. Finally, it is a

2

differentiation and maturation mediator (DMM) that can cause differentiation of human

myeloid leukemia cells and pre-B cells to mature into antibody-secreting cells [18].

4. Concentration of substrates or other ligands

Substrates and other ligands can alter how proteins interact with one

another as well as the availability a protein within a cell [8]. Both substrates and other

ligands have the ability to adjust and determine many cellular mechanisms and processes

within the cell. Below are just a few examples of how the concentration of substrates and

other ligands affect the mechanisms of moonlighting proteins.

As stated earlier, PutA is a protein that was shown to moonlight depending on its

localization within a cell. However, PutA activity is also dependent on the concentration

of its substrate. When proline, a substrate, concentrations are high PutA binds to it, but,

when proline concentration levels are low, PutA binds to DNA and acts as a

transcriptional repressor [19].

Another example is portrayed in the enzyme aconitase. Aconitase is an iron

dependent enzyme that only has catalytic activity when the concentration of iron is high.

Once the iron concentration drops, aconitase loses its 4Fe-4S cluster and enzymatic

activity. When the cluster dismantles, the protein becomes an iron responsive element

binding protein (IRE-BP) that promotes translation of mRNA encoding proteins involved

in iron uptake [20, 21].

3

5. Binding Sites

The fifth type of mechanism that enables moonlighting proteins to have

multiple functions is the presence of multiple specific binding sites. In E. coli, the

aspartate receptor functions as a receptor for aspartate. It also acts as a receptor for the

maltose binding protein (MBP). The aspartate receptor has separate binding sites for the

aspartate binding and MBP binding functions [22, 23].

Figure 1: Aconitase is the same protein as IRP1. Picture from T. A. Rouault, Science (2006) 314:1886.

6. Oligomerization

Some proteins have one function as a monomer and another as a dimer,

trimer, or other multimer. A good example is the glycolytic enzyme that as a tetramer,

converts glyceraldehyde-3-phosphate to 1,3-diphosphoglycerate. This glycolytic enzyme

Enzyme in Citric Acid Cycle Citrate -> isocitrate

Binds RNA To regulate translation

4

is found in humans as glyceraldehyde-3-phosphate dehydrogenase. The monomeric

version of this protein is a uracil-DNA glycosylase. Meaning, unlike the glycolytic

tetramer enzymatic activity as stated above, the uracil-DNA glycosylase removes uracil

that is present in DNA during DNA synthesis and/or the deamination of cytosine residues

[24].

The above mechanisms are just a few of the various ways that proteins can have

multiple functions that allows them to be categorized as proteins that moonlight. Other

mechanisms include “self-regulation” of transcription or translation, such as in

thymidylate synthase that binds to an RNA stem-loop structure that causes it to inhibit the

translation process of its own gene [25]. The mechanisms will continue to grow as we

continue to find more proteins that show multiple functionality within a cell. As more

mechanisms become discovered this will eventually lead to more moonlighting proteins

to be discovered as well.

B. How are Moonlighting Proteins Identified?

Moonlighting proteins are found in many types of cells as well as many different

biochemical pathways. Due to the diversity of moonlighting proteins it is a challenge to

identify them. Since moonlighting proteins have the ability to have multiple functions

and are found in multiple regions within the cell, we need to use a variety of methods and

procedures in order to identify them. The multiple functions of the known moonlighting

proteins have been identified using different methods, and we have selected proteins for

the database for which there is published biochemical or biophysical data that both

functions are found in the same protein. We have not included proteins for which the

5

multiple “functions” are simply different aspects of the same function (i.e. “membrane

protein” and “transmembrane channel”) or for which a second function is only suggested

by genetics data [1-8].

C. Why Are Moonlighting Proteins Important?

The reason we are focusing on these multifunctional proteins is due to the various

benefits they provide the cell or organism and their potential impact on current areas of

biology, including predicting functions for the many genes being identified in the genome

sequencing projects.

Moonlighting proteins have the ability to coordinate various cellular pathways,

which allow the organism to adapt and respond to the changing conditions. An example

of this is the cystic fibrosis transmembrane conductance regulator (CFTR), which is a

chloride channel as well as a regulator for other channels within the organism. The other

channels that CFTR help to regulate are the ORCC or outwardly rectifying chlorine

channel and ENaC, which is a sodium channel. Both the ORCC and ENaC help promote

ion homeostasis [26].

The presence of moonlighting proteins makes it more difficult to predict the

functions of the many proteins being identified in the genome sequencing projects. A

key problem is that homologues of moonlighting proteins might or might not have both

functions. The MoonProt database will provide a curated source of proteins that are

known to moonlight. This can help clarify if a protein is known to have both functions.

It will also serve as a gold standard for the future development of methods to identify

moonlighting functions of additional proteins.

6

II. Moonlighting Protein Database (MoonProt)

A. Background

This database was created in order to be a resource for researchers and for

educational purposes. The moonlighting protein database will be the first database that is

strictly devoted to moonlighting proteins that have been discovered and will be

discovered in the future. The Jeffery laboratory will be in charge of updating the

database and verifying the multiple functions of the moonlighting proteins that are

continually being found. As the database continues to grow, we hope that it will play a

significant role in biological research.

B. Examples of Online Databases

There are many types of databases that are currently online and available as

resources for users. Each database has its own unique feature that separates it from the

other online resources, whether it is how it is formatted or the information that it

provides, such as gene sequences, protein names, information about one species, etc.

The moonlighting database was developed based on characteristics from these

previous online databases. In this section, I will cover some of the various types of

databases that are already in use, and describe what aspects from these databases were

used in creating the moonlighting database.

1. Protein Data Bank (PDB)

The largest protein structure database that is available online is the Protein

Data Bank (PDB). The PDB is a database that contains the three-dimensional structural

data of large biological molecules, such as nucleic acids and proteins. Biochemists and

7

biologists from all around the world update this database. Most major scientific journals

now require scientists to submit their structure data to the PDB. If the contents of the

PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary)

databases that use the data. The information about the protein structures is mainly from

X-ray crystallography and NMR spectroscopy [32, 33, 34].

X-ray crystallography is a method of determining the arrangement of atoms

within a crystal. A beam of X-rays strikes a crystal, which causes the beam of light to

scatter into many specific directions. A crystallographer can use the angles and different

intensities from the diffracted beams to determine the three-dimensional shape of the

object that makes up the crystal [35].

NMR spectroscopy is Nuclear Magnetic Resonance Spectroscopy. NMR is a

research technique that uses the magnetic properties of certain atomic nuclei in order to

determine physical and chemical properties of atoms or molecules. NMR provides

detailed information about the structure, dynamics, reaction state, and the chemical

environment of molecules [36].

Along with using the structures determined by these procedures to help us learn

about the proteins that moonlight; the moonlighting database also links to the Protein

Data Bank database. Each protein that is incorporated in the PDB is given a unique

identifying ID, known as the PDB ID, which is a 4-character unique identifier of every

protein structure that is in the Protein Data Bank. The PDB IDs are used to retrieve

structures from the Protein Data Bank.

8

A 4-character PDB ID is assigned to each new structure. The IDs are

automatically assigned and do not have meaning. However, they serve as the unique

identifier of each entry in the Protein Data Bank. As such, they are used throughout the

scientific literature to refer to entries in the Protein Data Bank. Therefore, if the PDB ID

of an entry in the Protein Data Bank is known, it is the most direct way to retrieve the

information for that protein from the database [32, 33, 34].

The moonlighting protein database includes the PDB ID for any moonlighting

protein for which a structure is known, as well as incorporating its own unique identifier.

If the user knows a PDB ID for the moonlighting protein for which they wish to search,

the user can use it to search the moonlighting database in order to gather information

about that protein. Also, users will be able to use the unique identifier given by our

database to help retrieve more information about a specific protein.

2. Entrez, The Life Sciences Search Engine

Entrez is a global search engine that incorporates multiple databases. This

database is supported by the NCBI (National Center for Biotechnology Institute). Entrez

incorporates multiple database forms, including: Genome, Structure, Nucleotide,

PubMed, and Taxonomy, to name a few.

The information that is included in Entrez is a compilation from multiple

institutes and centers, which gives Entrez an edge of being able to obtain various amounts

of information for the user. For example, if the user searches for a specific protein,

aconitase, he or she will be given not only the genetic information, such as its DNA

sequence and its structure, but also a list of published articles describing research that

used this protein [37].

9

Although Entrez is on a much larger scale than our moonlighting protein

database, we were able to instill some of its attributes and functionality within our own

database. The most important thing that we were able to adopt was the ability of

allowing the user to have various methods as their search criteria for a given protein. The

moonlighting database gives the ability to the user to search from multiple search

characteristics, such as: name of the protein, PDB ID, organisms, sequences, as well as

gene ID. Also, if the user only knows partial information for any of the search criteria

from above, the database will still populate all possible options that fit that specific

search [34, 37].

Also, in the same way that Entrez uses other databases to gather information for a

given search, the MoonProt database will use the inputted search to gather information

based on that specific protein, as well as give links to related information that may be on

other websites and databases. This way, it will aid the user to gather information from

multiple sources all in one location.

3. European Bioinformatics Institute Database

The last database that I will mention deals with an international database that has

a focus on bioinformatics. The European Bioinformatics Institute (EBI) is a center for

research and services in bioinformatics, and is part of the European Molecular Biology

Laboratory (EMBL). It is located on the Wellcome Trust Genome Campus in Hinxton,

Great Britain.

The original reason for the creation of the database was to establish a central

computer database of DNA sequences, rather than have scientists submit sequences to

10

journals. This grew into a much larger scale, and the EBI-EMBL grew into a larger

project that encompasses multiple databases, similar to that of Entrez.

The multiple databases and programs associated with EBI include: CluSTr, CSA,

HPI, InterPro, IPI, and UniProt to name a few. These databases all work in conjunction

to gather specific information about a given protein for the user. The reason that EBI is

included in this paper is to show the impact of bioinformatics on an international level.

The European Bioinformatics Institute has made a large contribution in the field of

bioinformatics, and the results of their research will aid in the development of methods of

finding proteins that have dual functionality within a cell or organism [38, 39].

Another important factor about the EBI is that it also has programs and

presentations for teaching and training users about Bioinformatics through the use of their

databases. This is one of the future goals of the MoonProt database. This database will

not only allow users to find information about moonlighting proteins, but also use it as a

teaching tool as well.

C. Moonlighting Proteins Database (MoonProt)

This section will be focusing on the search parameters as well as how to navigate

through the MoonProt database. The final section of this thesis will go into the technical

aspects of the database, and the development and programming behind creating the

database.

In the sections that follow, we will go through each of the main pages within the

database, followed by how to properly search for specific moonlighting proteins.

11

1. Home Page

The home page is the first page the user will be directed to when they enter the

website. The homepage of the moonlighting database is very “clean”, meaning that we

emphasized making it easy to navigate. This is because we wanted to make the database

user-friendly. The home page appears as below:

Figure 2: HomePage Screenshot

12

There are five main pages to which the user can navigate to from the home page.

These pages include: Proteins, People, FAQ’s, Articles and Other Publications. The

Home Page allows the user to go to any of the five pages listed above and also includes a

short description about moonlighting proteins. On the top right of this page, the user has

the option to search for their protein of interest.

The Home Page also has a small flash production, the part with the black

background that depicts three moonlighting proteins with their dual functions. In the

example in the screenshot, it shows the Cytochrome c moonlighting protein, which has

functions as an energy metabolism enzyme and an apoptosis protein.

2. Proteins Page

The proteins page includes a list of all the moonlighting proteins that are

currently included in the MoonProt database. This page is another avenue for users to

look at the list of moonlighting proteins that are available for them to gather information

about. Each of the proteins will have an available link to direct the user to its specific

page.

13

Figure 3: Proteins Page Screenshot

Each protein page is organized in three sections. Below are examples of the three

parts. In the examples below we will be looking at the protein, DegP (HtrA) chaperone,

which acts both as a peptidase and chaperone protein [29, 30, 31]. The three sections that

are detailed for each protein include: General Information, Structure Information and

Information about functions.

14

Figure 4: Detailed Page – General Information Screenshot

Figure 5: Detailed Page – Structural Information Screenshot

15

Figure 6: Detailed Page – Functional Information Screenshot

3. People Page

The People page describes the people involved in constructing the

MoonProt Database and how they can be contacted.

It contains a short biography about the creator of this database, as well as his

educational background and research focus as an undergrad as well as a graduate student.

16

The information also contains information about Dr. Constance Jeffery, who

compiled the initial list of over 200 known moonlighting proteins and continues to search

for more proteins that moonlight. It contains a short biography of Dr. Jeffery, about her

educational background as well as research focus.

The page also lists contact information for members of the Jeffery Laboratory for

any questions or clarifications that a user may have about proteins listed on the website.

Figure 7: People Page Screenshot

17

4. FAQ’s Page

The FAQ’s page is exactly what it sounds. It is a page dedicated to the

most Frequently Asked Questions regarding moonlighting proteins and the database. On

this page, we will address the most common questions that are associated with topic of

moonlighting proteins, and give examples of certain scenarios that depict a protein that

moonlight versus those that do not moonlight.

It is here that users will have the ability to ask questions regarding moonlighting

proteins and find information about submitting information about additional proteins that

they believe to be moonlighting.

18

Figure 8: FAQ’s Page Screenshot

5. Articles Page

The articles page is a compilation of previously published work dealing

with moonlighting proteins. This page contains articles, papers, and other forms of media

(video/audio) that give more information about proteins that moonlight.

19

Figure 9: Articles Page Screenshot

6. Other References Page

This page is dedicated to other articles and papers that the Jeffery

laboratory has published.

20

Figure 10: Other References Page Screenshot

D. Search Parameters

The search box located on the top right of the database home page is a great way

for the user to locate the page for a specific protein or proteins that they are interested in.

The search box is linked to an SQL database, which hosts all the moonlighting proteins.

21

It uses various algorithms and different search criteria in order to find the protein of

interest.

One of the search parameters, as stated earlier, is the PDB ID from the Protein

Data Bank. Another search parameter is using the name of the protein. Users can

acquire information about a protein by typing in the name of the protein or even part of

the name of the protein. The moonlighting database will suggest all possible proteins

names that fit the criteria from the user’s input.

Users also have the option to search for proteins based on the organism that they

are found in. This may prove to be a useful search for researchers and scientists because

using such a broad search may show multiple proteins that moonlight for a given

organism. For example, E. coli contains over 20 known moonlighting proteins.

22

III. TECHNICAL ASPECTS OF THE MOONLIGHTING DATABASE A. Database The database is based upon the Structured Query Language (SQL). SQL is a

programming language that is designed for managing data in relational database

management systems (RDBMS). SQL allows users to create a database using many

different elements in order to help manipulate the data and information stored in the

database. These elements include clauses, expressions, predicates, queries, and

statements. Although all five elements are essential for building a database, the two main

ones that were focused on in constructing the MoonProt Database were the query and

statement elements [40].

1. Query The most common element in SQL is the query. The query element is a

way that users can instill different commands into their database in order to output

information that they wish to obtain from the newly created database.

The most common statement that is found in all queries is the SELECT statement. The

SELECT statement retrieves information from one or more tables. Queries allow the user

to describe the desired data, leaving the database management system (DBMS)

responsible for planning, optimizing, and performing the physical operations necessary to

produce that result as it chooses. A query includes a list of columns to be included in the

final result immediately following the SELECT keyword. SELECT is the most complex

statement in SQL, and has optional clauses that can minimize and specify our search on

the moonlighting database. These clauses include: FROM, WHERE, GROUP BY,

HAVING, and ORDER BY [40, 41].

23

The FROM clause indicates the table(s) from which data is to be retrieved. The WHERE

clause includes a comparison predicate, which restricts the rows returned by the query.

The WHERE clause eliminates all rows from the result set for which the comparison

predicate does not evaluate to the given search criteria. The GROUP BY clause is used

to project rows having common values into a smaller set of rows.

The HAVING clause includes a predicate used to filter rows resulting from the GROUP

BY clause. Finally, the ORDER BY clause identifies which columns are used to sort the

resulting data, and in which direction they should be sorted, this usually is either by

ascending or descending order. Each of these clauses was used in order to assemble the

moonlighting database. Reference Appendix 2 for SQL code.

B. Server The database and website is hosted on an University of Illinois at Chicago Server.

Through the use of the Academic and Computing Center (ACCC), the website will be run

through the university using a personalized tigger account.

The website will also be run simultaneously through a Jeffery laboratory server.

The reason for multiple servers is due to the fact that the having dual servers will allow

users to navigate and load the pages from the moonlighting database at a faster speed.

Not to mention, having the moonlighting database on two servers allows us to have the

moonlighting database up and running in case one of the server goes down.

C. PHP Files PHP is a general-purpose server-side scripting language originally designed for

Web development to produce dynamic Web pages. Ultimately, a Web server with a PHP

processor module that generates the resulting Web page interprets the code that we have

24

written for the moonlighting protein website. PHP can be deployed on most Web servers

and also as a standalone shell on almost every operating system and platform. [42,43,44]

Each PHP page is coded using HTLM or Hypertext Markup Language. HTML is

written in the form of HTML elements consisting of tags enclosed in angle brackets, such

as “”, within each of the web pages. The tags are composed in pairs, with each

tag meaning something different within the webpage. In between these tags web

designers can add text, tags, comments and other types of text-based content.

The purpose of a web browser is to read HTML documents and compose them

into visible or audible web pages. The browser does not display the HTML tags, but uses

the tags to interpret the content of the page.

The moonlighting protein database is created from several php pages. There are a

total of six php pages: Home, About, Articles, Other Publications, Proteins, and a

separate general page for each moonlighting protein on the database. A list of all the

pages and their php codes can be seen in appendix 2 [45].

25

IV. CONCLUSIONS

The MoonProt Database was created in order to serve as a resource for

researchers, scientists, and for the general public to know that there are such proteins that

exhibit these unique characteristics found in our world today.

However, we hope the MoonProt Database will not only be used for educational

purposes, but to act as a forum for proteins exhibiting moonlighting characteristics to be

identified for future projects and serve as a stepping stone for further research in the

various fields of proteomics and genomics.

26

APPENDICIES

27

Appendix 1 – List of Moonlighting Proteins with Characteristics

ID and Title SMC-3 (Structural maintenance of chromosome 3)

General Information MoonProt ID: First appeared in release: 1.0 Name(s): Structural maintenance of chromosome protein 3, chondroitin

sulfate proteoglycan 6, Chromosome segregation protein SmcD, Basement membrane-associated chondroitin proteoglycan (Bamacan), Mad member-interacting protein 1, Chromosome segregation protein SmcD

UniProt: Q9CW03 SwissProt: SMC3_MOUSE EMBL: AF141294 PIR: GO terms: Biological processes: Cell cycle, Cell division, DNA damage,

DNA repair, Meiosis, Mitosis, stem cell maintenance, sister chromatid cohesion, signal transduction, mitotic spindle organization, regulation of DNA replication, negative regulation of DNA endoreduplication. Cellular component: Centromere, Chromosome, Nucleus, chromatin, basement membrane, cytoplasm, lateral element, meiotic cohesin complex, nuclear matrix, nucleoplasm, spindle pole Molecular functions: ATP binding, chromatin binding

Organism(s) for which both functions have been demonstrated:

Mus musculus (Mouse)

Sequence length: 1217 Quaternary structure: HINGE HETERODIMER

28


FASTA sequence: >sp|Q9CW03|SMC3_MOUSE Structural maintenance of chromosomes protein 3 OS=Mus musculus GN=Smc3 PE=1 SV=2MYIKQVIIQGFRSYRDQTIVDPFSSKHNVIVGRNGSGKSNFFYAIQFVLSDEFSHLRPEQRLALLHEGTGPRVISAFVEIIFDNSDNRLPIDKEEVSLRRVIGAKKDQYFLDKKMVTKNDVMNLLESAGFSRSNPYYIVKQGKINQMATAPDSQRLKLLREVAGTRVYDERKEESISLMKETEGKREKINELLKYIEERLHTLEEEKEELAQYQKWDKMRRALEYTIYNQELNETRAKLDELSAKRETSGEKSRQLRDAQQDARDKMEDIERQVRELKTKISAMKEEKEQLSAERQEQIK QRTKLELKAKDLQDELAGNSEQRKRLLKERQKLLEKIEEKQKELAETEPKFNSVKEKEERGIARLAQATQERTDLYAKQGRGSQFTSKEERDKWIKKELKSLDQAINDKKRQIAAIHKDLEDTEANKEKNLEQYNKLDQDLNEVKARVEELDRKYYEVKNKKDELQSERNYLWREENAEQQALAAKREDLEKKQQLLRAATGKAILNGIDSINKVLEHFRRKGINQHVQNGYHGIVMNNFECEPAFYTCVEVTAGNRLFYHIVDSDEVSTKILMEFNKMNLPGEVTFLPLNKLDVRDTAYPETNDAIPMISKLRYNPRFDKAFKHVFGKTLICRSMEVSTQLARAFTMDCITLEGDQVSHRGALTGGYYDTRKSRLELQKDVRKAEEELGELEAKLNENLRRNIERINNEIDQLMNQMQQIETQQRKFKASRDSILSEMKMLKEKRQQSEKTFMPKQRSLQSLEASLHAMESTRESLKAELGTDLLSQLSLEDQKRVDALNDEIRQLQQENRQLLNERIKLEGIITRVETYLNENLRKRLDQVEQELNELRETEGGTVLTATTSELEAINKRVKDTMARSEDLDNSIDKTEAGIKELQKS MERWKNMEKEHMDAINHDTKELEKMTNRQGMLLKKKEECMKKIRELGSLPQEAFEKYQTLSLKQLFRKLEQCNTELKKYSHVNKKALDQFVNFSEQKEKLIKRQEELDRGYKSIMELMNVLELRKYEAIQLTFKQVSKNFSEVFQKLVPGGKATLVMKKGDVEGSQSQDEGEGSGESERG SGSQSSVPSVDQFTGVGIRVSFTGKQGEMREMQQLSGGQKSLVALALIFAIQKCDPAPFYLFDEIDQALDAQHRKAVSDMIMELAVHAQFITTTFRPELLESADKFYGVKFRNKVSHIDVITAEMAKDFVEDDTTHG

Structure information PDB ID: 2WD5 X-ray 2.70 B 484-696

Information about functions

29


One Function: SMC3 interacts with SMC1 and other non-Smc subunits like Scc3 and Scc1 (also called Rad21) to form a cohesion complex, called "cohesin," that maintains proper sister chromatid cohesion throughout the cell cycle and during mitosis to ensure accurate chromosome segregation. Each Smc heterodimer associates with non-Smc subunits to form functional Smc complexes.

E.C. number: Not an enzyme Reference(s) for function:

The Smc complexes in DNA damage response. Wu N, Yu H. Cell Biosci. 2012 Feb 27;2:5. PMID: 22369641 Characterization of the components of the putative mammalian sister chromatid cohesion complex. Darwiche N, Freeman LA, Strunnikov A. Gene. 1999 Jun 11;233(1-2):39-47. PMID: 10375619

Location of functional site(s) and reference(s) for that site:

Conversion of cohesin to a cohesive state where it can bind DNA is activated by the acetylation of Smc3 by the Eco1 family of acetyltransferases. The two nucleotide-binding Walker A and Walker B motifs reside in the two different ATPase halves of the ATPase head. There is a P-loop motif starting at residue 32 which is an ATP binding site, and a DA box motif starting at residue 1114 which is a DNA binding site.

Cellular Location of Function:

The cohesin protein complex is associated with chromosomes at all points of cell cycle except during mitosis when the complex is redistributed from the nucleus/chromosomes to the whole cell volume starting in prometaphase. The proteins remained off the chromosomes throughout metaphase and anaphase. They started to re-localize around the chromosomes again in telophase.

Comments:

Another Function: IN HUMANS - SMC3 and SMC1 are phosphorylated as a part of in intra-S-phase checkpoint activation mechanism to block DNA synthesis in response to DNA damage.

30


References for function: The Smc complexes in DNA damage response. Wu N, Yu H. Cell Biosci. 2012 Feb 27;2:5. PMID: 22369641 Luo H, Li Y, Mu JJ, Zhang J, Tonaka T, Hamamori Y, Jung SY, Wang Y, Qin J. Regulation of intra-S phase checkpoint by ionizing radiation (IR)-dependent and IR-independent phosphorylation of SMC3. J Biol Chem. 2008;283:19176–19183. doi: 10.1074/jbc.M802299200.


The S1083 residue of SMC3 is phosphorylated in response to IR treatment, which results in intra-S phase checkpoint activation

Another Function: Bamacan/SMC3 is a necessary component, together with

SMC1, of a multimeric complex RC1 that has DNA recombination/renaturation, DNA ligase, and DNA polymerase functions. The enzymes acts to repair a gapped or deleted DNA.

References for function: The Smc complexes in DNA damage response. Wu N, Yu H. Cell Biosci. 2012 Feb 27;2:5. PMID: 22369641 A mammalian protein complex that repairs double-strand breaks and deletions by recombination. Jessberger R, Podust V, Hübscher U, Berg P. J Biol Chem. 1993 Jul 15;268(20):15070-9. PMID: 8392064 Overexpression of bamacan/SMC3 causes transformation. Ghiselli G, Iozzo RV. J Biol Chem. 2000 Jul 7;275(27):20235-8. PMID: 10801778

E.C. number:

31



Bamacan/SMC3 bind palindromic DNA sequences through its C terminus and by allowing the formation of protein-DNA structures that are accessible to the action of DNA-modifying enzymes.


Nucleus.

Comments: Cohesin aids in DNA double-strand break (DSB) repair through sister chromatid homologous recombination (HR) by maintaining sister-chromatid cohesion at or near the DSB to keep the DSB and the undamaged sister chromatid at close proximity, thereby promoting strand invasion and sister-chromatid HR. RC-1 contains a DNA polymerase, identified as DNA polymerase epsilon, that co-purifies with RC-1. A DNA ligase, most likely mammalian DNA ligase III, and a 5'-3' exonuclease also copurify with the RC-1.

Another function Bamacan is a proteoglycan, and is a component of a

component of the basement membrane in the Engelbreth-Holm-Swarm tumor matrix, the renal mesangial matrix, and the basement membrane of other tissues. bamacan is involved in the control of cell growth and transformation

32


Reference for function: Complete cDNA cloning, genomic organization, chromosomal assignment, functional characterization of the promoter, and expression of the murine Bamacan gene. Ghiselli G, Siracusa LD, Iozzo RV. J Biol Chem. 1999 Jun 11;274(24):17384-93. PMID: 10358101 Perlecan and basement membrane-chondroitin sulfate proteoglycan (bamacan) are two basement membrane chondroitin/dermatan sulfate proteoglycans in the Engelbreth-Holm-Swarm tumor matrix. Couchman JR, Kapoor R, Sthanam M, Wu RR. J Biol Chem. 1996 Apr 19;271(16):9595-602. PMID: 8621634

E.C. number: Location of functional site(s) and reference(s) for that site:

Glycanation sites (Ser-Gly) are present at residues 36, 249, 1073, 1081, and 1116, where attachment of glycosaminoglycan side GAG chains is possible There is a P-loop motif starting at residue 32 which is an ATP binding site, and a DA box motif starting at residue 1114 which is a DNA binding site. These are highly conserved in SMC proteins as well.

Cellular Location of function:

Comments:

Another function Tumerogenisis: deregulated over-expression of bamacan/SMC3 may be directly linked to oncogenesis.

Reference for function: Overexpression of bamacan/SMC3 causes transformation. Ghiselli G, Iozzo RV. J Biol Chem. 2000 Jul 7;275(27):20235-8. PMID: 10801778


33



Comments: overexpression of bamacan/SMC3 alone is sufficient to initiate cell transformation in normal fibroblasts

If you have any comments or wish to privide additional references for this protein or its functions, please email us at [email protected]. Jeffery Lab University of Illinois at Chicago MC567 900 S. Ashland Ave. Chicago, IL 60607 USA [email protected]

34

mailto:[email protected]


ID and Title 1-cys peroxiredoxin

General Information MoonProt ID: First appeared in release: 1.0 Name(s): 1-Cys peroxiredoxin (1-Cys PRX),

Peroxiredoxin-6, Human Prx enzyme HORF6, Non-selenium Glutathione peroxidase, Acidic calcium-independent phospholipase A2 (aiPLA2),

UniProt: P30041 SwissProt: PRDX6_HUMAN EMBL: D14662 PIR: GO terms: hydrogen peroxide catabolic process

phospholipid catabolic process response to oxidative stress cytoplasmic membrane-bounded vesicle cytosol lysosome antioxidant activity glutathione peroxidase activity peroxiredoxin activity phospholipase A2 activity


Human

Sequence length: 224 Quaternary structure:

35


FASTA sequence: >sp|P30041|PRDX6_HUMAN Peroxiredoxin-6 OS=Homo sapiens GN=PRDX6 PE=1 SV=3 MPGGLLLGDVAPNFEANTTVGRIRFHDFLGDSWGILFSHPRDFTPVCTTELGRAAKLAPEFAKRNVKLIALSIDSVEDHLAWSKDINAYNCEEPTEKLPFPIIDDRNRELAILLGMLDPAEKDEKGMPVTARVVFVFGPDKKLKLSILYPATTGRNFDEILRVVISLQLTAEKRVATPVDWKDGDSVMVLPTIPEEEAKKLFPKGVFTKELPSGKKYLRYTPQP

Structure information PDB ID: 1PRX X-ray 2.00 A/B 1-224

The protein has two discrete domains and forms a dimer. The N-terminal domain has a thioredoxin fold and the C-terminal domain is used for dimerization.

Information about functions One Function: Non-selenium Glutathione Peroxidase:

Can reduce H2O2 and short chain organic, fatty acid, and phospholipid hydroperoxides

E.C. number: 1.11.1.15

36

http://enzyme.expasy.org/EC/1.11.1.15


Reference(s) for function: 1-Cys peroxiredoxin, a bifunctional enzyme with glutathione peroxidase and phospholipase A2 activities. Chen JW, Dodia C, Feinstein SI, Jain MK, Fisher AB. J Biol Chem. 2000 Sep 15;275(37):28421-7. PMID: 10893423 Crystal structure of a novel human peroxidase enzyme at 2.0 A resolution. Choi HJ, Kang SW, Yang CH, Rhee SG, Ryu SE. Nat Struct Biol. 1998 May;5(5):400-6. PMID: 9587003


The active site for peroxidase activity is Cys(47) in the PVCTTE consensus sequence. It exists as cysteine-sulfenic acid in the crystal, is located at the bottom of a relatively narrow pocket. The positively charged environment surrounding Cys 47 accounts for the peroxidase activity of the enzyme, which contains no redox cofactors.


Activity is maximal at pH 7-8, so it functions best in cytosol.

Comments: The protein has two discrete domains and forms a dimer. The N-terminal domain has a thioredoxin fold and the C-terminal domain is used for dimerization.

Another Function: Acidic calcium-independent Phospholipase A2 (aiPLA2):

regulation of phospholipid turnover

37


References for function: 1-Cys peroxiredoxin, a bifunctional enzyme with glutathione peroxidase and phospholipase A2 activities. Chen JW, Dodia C, Feinstein SI, Jain MK, Fisher AB. J Biol Chem. 2000 Sep 15;275(37):28421-7. PMID: 10893423

E.C. number: 3.1.1.4 Location of functional site(s) and reference(s) for that site:

Ser(32) in the GDSWG consensus sequence is the active site which provides the catalytic nucleophile for the hydrolase activity


activity is maximal at pH 4, so it functions best in lysosomes and lung secretory organelles.

Comments:


38



ID and Title DegP (HtrA) chaperone

General Information MoonProt ID: First appeared in release: 1.0 Name(s): DegP (HtrA) chaperone

Periplasmic serine endoprotease DegP Heat shock protein DegP Protease Do

UniProt: P0C0V0 SwissProt: DEGP_ECOLI EMBL: M36536.1 PIR: S45229 GO terms: misfolded or incompletely synthesized protein catabolic

process protein folding response to oxidative stress response to temperature stimulus outer membrane-bounded periplasmic space plasma membrane serine-type endopeptidase activity


E. Coli

Sequence length: 474 Quaternary structure: Exists as an inactive hexamer (DegP6) and active 12-mer

(DegP12) and 24-mer (DegP24) oligomeric forms. DegP hexamer is formed by staggered association of trimeric rings. The proteolytic sites are located in a central cavity. The inner cavity is lined by several hydrophobic patches that may act as docking sites for unfolded polypeptides.

39


FASTA sequence: >sp|P0C0V0|DEGP_ECOLI Periplasmic serine endoprotease DegP OS=Escherichia coli (strain K12) GN=degP PE=1 SV=1 MKKTTLALSALALSLGLALSPLSATAAETSSATTAQQMPSLAPMLEKVMPSVVSINVEGSTTVNTPRMPRNFQQFFGDDSPFCQEGSPFQSSPFCQGGQGGNGGGQQQKFMALGSGVIIDADKGYVVTNNHVVDNATVIKVQLSDGRKFDAKMVGKDPRSDIALIQIQNPKNLTAIKMADSDALRVGDYTVAIGNPFGLGETVTSGIVSALGRSGLNAENYENFIQTDAAINRGNSGGALVNLNGELIGINTAILAPDGGNIGIGFAIPSNMVKNLTSQMVEYGQVKRGELGIMGTELNSELAKAMKVDAQRGAFVSQVLPNSSAAKAGIKAGDVITSLNGKPISSFAALRAQVGTMPVGSKLTLGLLRDGKQVNVNLELQQSSQNQVDSSSIFNGIEGAEMSNKGKDQGVVVNNVKTGTPAAQIGLKKGDVIIGANQQAVKNIAELRKVLDSKPSVLALNIQRGDSTIYLLMQ

Structure information PDB ID: 1KY9 X-ray 2.80 A/B 27-474 [»]

2ZLE electron microscopy 28.00 A/B/C/E/F/G/H/I/J/K/L/M 27-474 [»] 3CS0 X-ray 3.00 A 27-474 [»] 3MH4 X-ray 3.10 A/B 27-474 [»] 3MH5 X-ray 3.00 A/B 27-474 [»] 3MH6 X-ray 3.60 A 27-474 [»] 3MH7 X-ray 2.96 A 27-474 [»] 3OTP X-ray 3.76 A/B/C/D/E/F 27-474 [»] 3OU0 X-ray 3.00 A 27-474 [»] 4A8D electron microscopy 28.00 A/B/C/D/E/F/G/H/I/J/K/L 27-474 [»]

Information about functions One Function: Peptidase (Heat Shock Protein):

degrades transiently denatured and unfolded proteins which accumulate in the periplasm following heat shock or other stress conditions. Efficient with Val-Xaa and Ile-Xaa peptide bonds, suggesting a preference for beta-branched side chain amino acids It can degrade IciA, ada, casein, globin and PapA.

40


References for function: ATP-dependent proteases that also chaperone protein biogenesis. Suzuki CK, Rep M, van Dijl JM, Suda K, Grivell LA, Schatz G. Trends Biochem Sci. 1997 Apr;22(4):118-23. Review. PMID: 9149530 Crystal structure of DegP (HtrA) reveals a new protease-chaperone machine. Krojer T, Garrido-Franco M, Huber R, Ehrmann M, Clausen T. Nature 2002 May 2;417(6884):102. PMID: 11919638


The catalytic triad (Asp 105, His 135, Ser 210) and the specificity pocket S1 exist in the proteolytic domain of HtrA. The active-site surface loops, LA, L1, and L2 are important in adjustment of the catalytic triad.


Cytosol

Comments: Optimum temperature is around 55 degrees Celsius. In the range from 37 to 55 degrees Celsius, the proteolytic activity rapidly increases with temperature.

Another Function: Chaperone Chaperones in the biogenesis of partially folded outer-membrane proteins (OMP)

E.C. number:

41


Reference(s) for function:

ATP-dependent proteases that also chaperone protein biogenesis. Suzuki CK, Rep M, van Dijl JM, Suda K, Grivell LA, Schatz G. Trends Biochem Sci. 1997 Apr;22(4):118-23. Review. PMID: 9149530 Crystal structure of DegP (HtrA) reveals a new protease-chaperone machine. Krojer T, Garrido-Franco M, Huber R, Ehrmann M, Clausen T. Nature 2002 May 2;417(6884):102. PMID: 11919638


Binding sites for misfolded proteins are located within the inner cavity (same residues as the protease domain). Three large hydrophobic grooves are constructed by residues of loop LA and L2, and are organized around the central Gln 206/Arg 207 cluster. There are also several hydrophobic binding sites in the PDZ1 domains which augment other binding sites.


Cytosol

Comments: The inner cavity is geometrically constricted, which means that substrates must be partially unfolded to reach the active site. PDZ domains could properly position the substrate for threading it into the central cavity.

Another function Reference for function:


42



Comments:


43



ID and Title Beta-subunit pyruvate dehydrogenase

General Information MoonProt ID: First appeared in release: 1.0

Name(s): Pyruvate dehydrogenase E1 component subunit beta PDH-B pyruvate dehydrogenase E1-beta chain

UniProt: P75391 SwissProt: ODPB_MYCPN EMBL: U00089 PIR: S73772. GO terms: glycolysis

pyruvate dehydrogenase (acetyl-transferring) activity


Mycoplasma pneumoniae

Sequence length: 327 Quaternary structure: Heterodimer of an alpha and a beta chain. The

moonlighting function focuses on only the beta subunit.

FASTA sequence: >sp|P75391|ODPB_MYCPN Pyruvate dehydrogenase E1 component subunit beta OS=Mycoplasma pneumoniae (strain ATCC 29342 / M129) GN=pdhB PE=3 SV=1 MSKTIQANNIEALGNAMDLALERDPNVVLYGQDAGFEGGVFRATKGLQKKYGEERVWDCPIAEAAMAGIGVGAAIGGLKPIVEIQFSGFSFPAMFQIFTHAARIRNRSRGVYTCPIIVRMPMGGGIKALEHHSETLEAIYGQIAGLKTVMPSNPYDTKGLFLAAVESPDPVVFFEPKKLYRAFRQEIPADYYTVPIGQANLISQGNNLTIVSYGPTMFDLINMVYGGELKDKGIELIDLRTISPWDKETVFNSVKKTGRLLVVTEAAKTFTTSGEIIASVTEELFSYLKAAPQRVTGWDIVVPLARGEHYQFNLNARILEAVNQLLK

Structure information PDB ID: no PDB structures

44

http://www.ebi.ac.uk/ena/data/view/U00089http://pir.georgetown.edu/cgi-bin/nbrfget?uid=S73772


Information about functions One Function: Pyruvate dehydrogenase (acetyl-transferring).

Catalyses the conversion of Pyruvate to acetyl-CoA and CO2.

References for function: Elongation factor Tu and E1 beta subunit of pyruvate dehydrogenase complex act as fibronectin binding proteins in Mycoplasma pneumoniae. Dallo SF, Kannan TR, Blaylock MW, Baseman JB. Mol Microbiol. 2002 Nov;46(4):1041-51. PMID: 12421310


Thiamine pyrophosphate binding site at residue 63


Cytoplasmic protein

Comments: Another Function: Fibrinogen binding.

Fibrinogen is a structural protein of the extracellular matrix, to which PDH-B binds when

E.C. number: Reference(s) for function: Elongation factor Tu and E1 beta subunit of pyruvate

dehydrogenase complex act as fibronectin binding proteins in Mycoplasma pneumoniae. Dallo SF, Kannan TR, Blaylock MW, Baseman JB. Mol Microbiol. 2002 Nov;46(4):1041-51. PMID: 12421310



Cellular surface

Comments: under identical growth conditions, gradient-purified mycoplasma membranes contain 22% of total PDH-B protein in the cell. PDH-B possesses transmembrane amino acid sequence domains, indicative of its surface location.

45

http://enzyme.expasy.org/EC/1.2.4.1


Another function Reference for function: E.C. number: Location of functional site(s) and reference(s) for that site:


Comments: If you have any comments or wish to privide additional references for this protein or its functions, please email us at [email protected]. Jeffery Lab University of Illinois at Chicago MC567 900 S. Ashland Ave. Chicago, IL 60607 USA [email protected]

46



ID and Title superoxide dismutase


Name(s): Superoxide dismutase [Mn] SOD Mn-SOD sod-A

UniProt: P47201 SwissProt: SODM_MYCAV EMBL: U11550 PIR: GO terms: superoxide metabolic process

metal ion binding superoxide dismutase activity


Mycobacterium avium

Sequence length: 207 Quaternary structure: Homotetramer, chains A, B, C, D FASTA sequence: >sp|P47201|SODM_MYCAV Superoxide dismutase [Mn]

OS=Mycobacterium avium GN=sodA PE=3 SV=3 MAEYTLPDLDWDYAALEPHISGQINEIHHTKHHATYVKGVNDALAKLEEARANEDHAAIFLNEKNLAFHLGGHVNHSIWWKNLSPDGGDKPTGELAAAIDDAFGSFDKFRAQFSAAANGLQGSGWAVLGYDTLGSRLLTFQLYDQQANVPLGIIPLLQVDMWEHAFYLQYKNVKADYVKAFWNVVNWADVQKRYAAATSKAQGLIFG

Structure information PDB ID: 1GN2 X-ray 3.40 A/B/C/D/E/F/G/H 1-207 [»]

1GN3 X-ray 4.00 A/B 1-207 [»] 1GN4 X-ray 2.50 A/B/C/D 1-207 [»] 1GN6 X-ray 2.90 A/B/C/D 1-207 [»] 1IDS X-ray 2.00 A/B/C/D 1-207 [»]

Information about functions

47


One Function: Superoxide Dismutase: catalyzes the degradation of superoxide anion radicals which are toxic to biological systems. Oxidoreductase rxn: 2 superoxide (O2−) + 2 H(+) O(2) + H(2)O(2)

E.C. number: 1.15.1.1 Reference(s) for function:

Mycobacterium avium-superoxide dismutase binds to epithelial cell aldolase, glyceraldehyde-3-phosphate dehydrogenase and cyclophilin A. Reddy VM, Suleman FG. Microb Pathog. 2004 Feb;36(2):67-74. PMID: 14687559


Binds to Mn cofactor at residues: 28 His, 76 His, 160 Asp, 164 His, which form the active site. "X-ray structure analysis of the iron-dependent superoxide dismutase from Mycobacterium tuberculosis at 2.0-A resolution reveals novel dimer-dimer interactions." Cooper J.B., McIntyre K., Badasso M.O., Wood S.P., Zhang Y., Garbe T.R., Young D. J. Mol. Biol. 246:531-544(1995) [PubMed: 7877174] An analysis of structural similarity in the iron and manganese superoxide dismutases based on known structures and sequences. Jackson SM, Cooper JB. Biometals. 1998 Apr;11(2):159-73. PMID: 9542069


Excreted into the extracellular space

Comments: The active site and the residues responsible for tetramerization of the enzyme are highly conserved across species and with both Fe and Mn binding SODs.

48


Another Function: Adhesin - Superoxide dismutase binds to a epithelial cell surface proteins fructose-1-6-bisphosphate aldolase B (aldolase), glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and cyclophilin A (CypA), which are all found in both the cytosol and in the membrane.

References for function: Mycobacterium avium-superoxide dismutase binds to epithelial cell aldolase, glyceraldehyde-3-phosphate dehydrogenase and cyclophilin A. Reddy VM, Suleman FG. Microb Pathog. 2004 Feb;36(2):67-74. PMID: 14687559 Mycobacterium avium binds to mouse intestinal mucus aldolase. Reddy VM, Suleman FG, Hayworth DA. Tuberculosis (Edinb). 2004;84(5):303-10. PMID: 15207805

E.C. number: -- Location of functional site(s) and reference(s) for that site:


Extracellular surface of the MAC (mycobacterium avium complex) membrane

Comments: SOD binds to epithelial cells and stimulates endocytosis of the MAC.


49



ID and Title Hsp60


Name(s): 60 kDa chaperonin GroEL protein Heat shock protein 60 Protein Cpn60

UniProt: P42383 SwissProt: CH60_HELPY EMBL: CAA52062.1 PIR: S36237 GO terms: protein refolding

response to stress cytoplasm ATP binding


helicobacter pylori

Sequence length: 546 Quaternary structure: double heptameric ring structure (14 subunits, 2 rings of 7

subunits each) that provides a protected cavity for proteins to fold

FASTA sequence: >sp|P42383|CH60_HELPY 60 kDa chaperonin OS=Helicobacter pylori (strain ATCC 700392 / 26695) GN=groL PE=3 SV=2 MAKEIKFSDSARNLLFEGVRQLHDAVKVTMGPRGRNVLIQKSYGAPSITKDGVSVAKEIELSCPVANMGAQLVKEVASKTADAAGDGTTTATVLAYSIFKEGLRNITAGANPIEVKRGMDKAAEAIINELKKASKKVGGKEEITQVATISANSDHNIGKLIADAMEKVGKDGVITVEEAKGIEDELDVVEGMQFDRGYLSPYFVTNAEKMTAQLDNAYILLTDKKISSMKDILPLLEKTMKEGKPLLIIAEDIEGEALTTLVVNKLRGVLNIAAVKAPGFGDRRKEMLKDIAILTGGQVISEELGLSLENAEVEFLGKAGRIVIDKDNTTIVDGKGHSHDVKDRVAQIKTQIASTTSDYDKEKLQERLAKLSGGVAVIKVGAASEVEMKEKKDRVDDALSATKAAVEEGIVIGGGAALIRAAQKVHLNLHDDEKVGYEIIMRAIKAPLAQIAINAGYDGGVVVNEVEKHEGHFGFNASNGKYVDMFKEGIIDPLKVERIALQNAVSVSSLLLTTEATVHEIKEEKAAPAMPDMGGMGGMGGMGGMM

50

http://pir.georgetown.edu/cgi-bin/nbrfget?uid=S36237


Structure information PDB ID: Information about functions One Function: Heat shock protein - facilitates folding, unfolding, and

translocation of polypeptides, as well as the assembly and disassembly of oligomeric protein complexes

E.C. number: --- Reference(s) for function:

A virulence factor of Helicobacter pylori: role of heat shock protein in mucosal inflammation after H. pylori infection. Kamiya S, Yamaguchi H, Osaki T, Taguchi H. J Clin Gastroenterol. 1998;27 Suppl 1:S35-9. PMID: 9872496



Cytosol

Comments: Another Function: Adhesin - acts as a virulence factor to facilitate binding to host cell

(human gastric cells) in H. pylori infection

51


References for function:

A virulence factor of Helicobacter pylori: role of heat shock protein in mucosal inflammation after H. pylori infection. Kamiya S, Yamaguchi H, Osaki T, Taguchi H. J Clin Gastroenterol. 1998;27 Suppl 1:S35-9. PMID: 9872496 Molecular chaperones in pathogen virulence: emerging new targets for therapy. Neckers L, Tatu U Cell Host Microbe. 2008 Dec 11;4(6):519-27. PMID: 19064253 The Hsp60 protein of Helicobacter pylori: structure and immune response in patients with gastroduodenal diseases. Macchia G, Massone A, Burroni D, Covacci A, Censini S, Rappuoli R. Mol Microbiol. 1993 Aug;9(3):645-52. PMID: 8105364



Surface protein

Comments: Another function Reference for function:



Comments:

52

Appendix 2 – Programmable Code for .php Pages

URL: http://tigger.uic.edu/htbin/codewrap/~mkmani2/moonlighting.php

Home Page

Untitled Document Home Proteins People FAQ's Articles Other References

53

http://tigger.uic.edu/htbin/codewrap/%7Emkmani2/moonlighting.php


54


Introduction

A moonlighting protein is a single protein that has multiple functions that are not due to gene fusions, multiple RNA splice variants or multiple proteolytic fragments. Moonlighting proteins do not include families of homologous proteins if the different functions are performed by different proteins in the protein family. They also do not include proteins that have multiple cellular roles but use the same biochemical function in each role. A single protein with multiple functions might seem surprising, but there are actually more than 100 examples of proteins that 'moonlight'. The variety of functions, methods to switch between functions, ways they might benefit the organism, and proposed methods of evolving a second function suggest that many more moonlighting proteins are likely to be found.

Characterization of a novel protein generally involves finding a function for a protein, but it does not necessarily include a search for all possible additional functions of a protein. Most moonlighting functions have been found by chance, and there is not currently a good method for predicting which proteins moonlight. In addition, moonlighting functions are often not conserved among protein homologues. This database provides a centralized, web-based location to organize information about moonlighting proteins for which there is biochemical and/or biophysical evidence of both functions being performed by the same protein.

55


900 S. Ashland Ave. (MC567), Chicago IL 60607 | ph: 312-996-3168, | Fax: 312-413-2691 | Copyright © 2008 The Board of Trustees of the University of Illinois

Jeffery Lab University of Illinois at Chicago MBRB Rm. 4260

Proteins Page

56


Results Home Proteins People FAQ's Articles Other References
ID Protein Name Function 1 Function 2 Species Name

57


1DegP (HtrA) chaperonePeptidase (Heat Shock Protein): degrades transiently denatured and unfolded proteins which accumulate in the periplasm following heat shock or other stress conditions.
Efficient with Val-Xaa and Ile-Xaa peptide bonds, suggesting a preference for beta-branched side chain amino acids
It can degrade IciA, ada, casein, globin and PapA.Chaperones in the biogenesis of partially folded outer-membrane proteins (OMP)E. Coli2SMC-3 (Structural maintenance chromosome 3)SMC3 interacts with SMC1 and other non-Smc subunits like Scc3 and Scc1 (also called Rad21) to form a cohesion complex, called "cohesin," that maintains proper sister chromatid cohesion throughout the cell cycle and during mitosis to ensure accurate chromosome segregation. Each Smc heterodimer associates with non-Smc subunits to form functional Smc complexes.IN HUMANS - SMC3 and SMC1 are phosphorylated as a part of in intra-S-phase checkpoint activation mechanism to block DNA synthesis in response to DNA damage.Mus musculus (Mouse) 31-Cys PeroxiredoxinAcidic calcium-independent Phospholipase A2 (aiPLA2): regulation of phospholipid turnoverNon-selenium Glutathione Peroxidase: Can reduce H2O2 and short chain organic, fatty acid, and phospholipid hydroperoxidesHuman4Beta-Subunit Pyruvate DehydrogenasePyruvate dehydrogenase (acetyl-transferring) : Catalyses the conversion of Pyruvate to acetyl-CoA and CO2.Fibrinogen binding: Fibrinogen is a structural protein of the extracellular matrix, to which PDH-B binds when Mycoplasma pneumoniae5Superoxide Dismutase


valign='top'>Superoxide Dismutase: catalyzes the degradation of superoxide anion radicals which are toxic to biological systems.

Oxidoreductase rxn:
2 superoxide (O2?) + 2 H(+) O(2) + H(2)O(2)Adhesin - Superoxide dismutase binds to a epithelial cell surface proteins fructose-1-6-bisphosphate aldolase B (aldolase), glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and cyclophilin A (CypA), which are all found in both the cytosol and in the membrane.Mycobacterium avium 6Hsp60 Heat shock protein - facilitates folding, unfolding, and translocation of polypeptides, as well as the assembly and disassembly of oligomeric protein complexesAdhesin - acts as a virulence factor to facilitate binding to host cell (human gastric cells) in H. pylori infectionHelicobacter Pyl 7PGI1 Phosphoglucose IsomeraseCatalyzes interconversion of glucose-6-phosphate and fructose-6-phosphate in glycolysis and gluconeogenesisCytokine/growth factor, binds to target cells and causes pre-B cells to mature into antibody secreting cells, supports survival of embryonal neurons, causes differentiation of some leukemia cell linesMouse

Human

People Page

Untitled Document

59


Home Proteins People FAQ's Articles Other References

60


Summary of Jeffery Lab Research

The genome projects yielded the sequences of tens of thousands of proteins. Elucidating the roles these proteins play in health and disease, and also how they can be used and/or modified for the development of novel therapeutics, biomaterials, biosensors, methods for energy production and methods for environmental remediation, will be aided by a better understanding of how a protein's amino acid sequence determines its structure and how a structure determine function. In the Jeffery lab we are using biophysical and biochemical methods along with computer-based structure analysis in several projects to study the connections between protein sequences, structures, and functions.

Analysis of protein sequences and structures to elucidate the connections between sequence, structure and function. This information might help in the future in developing better methods to predict a protein's function(s) from its sequence or structure. Two current projects in this area include an analysis of ligand binding sites in protein crystal structures and a study of the sequences and structures of "moonlighting proteins". Many protein functions can be inferred from the known functions of homologous proteins, but determining protein functions is complicated by an increasing number of "moonlighting proteins", proteins that have more than one function where the multiple functions are not a result of splice variants, gene fusions, or multiple isoforms (Jeffery, C. J. Moonlighting Proteins. (1999) Trends in Biochemical Sciences. 24: 8-11). We are preparing a database of the known moonlighting proteins and performing an analysis of their sequences and structures. Knowing more about moonlighting proteins could help in predicting which additional proteins might also have a second function, which would be useful in determining the function(s) of the thousands of proteins identified through the genome projects and the functions of the "unknown" proteins whose structures were solved as part of the Protein Structure Initiative. In addition, since the ability of proteins to moonlight can complicate interpretation of the results of proteomics projects, identifying the roles of proteins in disease, and the selection of biomarkers, understanding which proteins moonlight can be important for both basic research and medicine The development of novel approaches to increase the expression of transmembrane proteins for biochemical analysis and structure determination. Membrane proteins play key roles in health and disease and are the targets of the majority of pharmaceuticals in use today, but much less is known about their structures and mechanisms of function than for soluble proteins because of the challenges in their expression, purification, and structure determination. The goal of our new approaches is to alleviate the bottleneck in protein expression.

61


In a previous project, we elucidated the reaction mechanism of a glycolytic enzyme that moonlights as a tumor cell motility factor in breast cancer cells: phosphoglucose isomerase/autocrine motility factor (PGI/AMF). By solving six structures of PGI/AMF with different ligands bound, we developed a model of the multistep catalytic mechanism for this multifunctional enzyme/growth factor.
Biography of Prof. Jeffery
Constance (Connie) Jeffery obtained her B.S. degree at the Massachusetts Institute of Technology. She obtained her Ph.D. with Dan Koshland, Jr., at the University of California at Berkeley, where she studied transmembrane signaling by the E. coli aspartate receptor. During postdoctoral research with Greg Petsko and Dagmar Ringe at Brandeis University, she solved the X-ray crystal structures of phosphoglucose isomerase and S. cerevisiae cytoplasmic aspartate aminotransferase and coined the term "moonlighting proteins". She joined the faculty of the University of Illinois at Chicago in 1999. She continues to study the connections between a protein's sequence, structure, and function with an emphasis moonlighting proteins and transmembrane proteins.
Biography of Mathew Mani
Mathew Mani attended the University of Illinois at Urbana-Champaign and graduated with his Bachelors in Molecular and Cellular Biology. He also graduated with minors in both Chemistry and Computer Science before attending graduate school at the University of Illinois at Chicago for his Masters of Science in Bioengineering (Bioinformatics). Under the guidance and leadership of Dr. Constance Jeffery, he completed his research and thesis in developing a database for Moonlighting Proteins. He continues to work closely with Dr. Jeffery in the development of her Moonlighting Proteins research. He currently works in the pharmaceutical and healthcare industry in Northern Illinois.


312.996.2911 | Fax: 312.413.2435 | Copyright © 2008 The Board of Trustees of the University of Illinois | --> 900 S. Ashland Ave. (MC567), Chicago IL 60607 | ph: 312-996-3168, | Fax: 312-413-2691 | Copyright © 2008 The Board of Trustees of the University of Illinois


FAQ’s Page

Untitled Document

63


Home Proteins People FAQ's Articles Other References

FAQs about Moonlighting Proteins

What are moonlighting proteins?

A moonlighting protein is a single protein that has multiple functions that are not due to gene fusions, multiple RNA splice variants or multiple proteolytic fragments. Moonlighting proteins do not include families of homologous proteins if the different functions are performed by different proteins in the protein family, or proteins that have multiple cellular roles that involve the same biochemical function in different locations.

A classic example is phosphoglucose isomerase/autocrine motility factor, neuroleukin/differentiation and maturation mediator. It is both a cytosolic enzyme in glycolysis and an extracellular cytokine and growth factor.

64


Cytosolic Enzyme Extracellular Growth Factor

How common are moonlighting proteins?

It's not clear how common moonlighting is. The literature contains approximately 200 proteins for which there is biochemical and/or biophysical evidence of two different functions performed by one polypeptide chain. The known examples of moonlighting proteins include many types of proteins, including receptors, enzymes, transcription factors, adhesins and scaffolds. Moonlighting proteins are found in mammals, yeast, bacteria, plants, and many other organisms. Different combinations of biochemical functions are found, for example, an enzymatic function and a receptor-binding function found in phosphoglucose isomerase/autocrine motiility factor. Diverse methods are used to switch between functions: binding a small molecule, joining a multiprotein complex, binding to DNA, etc. Two proposed mechanisms/models for evolution of a moonlighting function make use of general features of protein structure and could apply to many protein types, potentially resulting in a large number of proteins that moonlight.

Are any moonlighting proteins involved in disease?

Already moonlighting proteins have been found to be involved in cancer cell motility, angiogenesis, DNA synthesis or repair, and chromatin and cytoskeleton structure. The ability of a protein to moonlight can complicate the elucidation of molecular mechanisms of disease, the identification of biomarkers of disease progression, and the development of novel therapeutics.

How did moonlighting proteins evolve?

Two general methods have been suggested for evolving a moonlighting function. Some proteins appear to have been recruited for a new function when new cell types or organs evolved, but without significant

65


modification of the protein structure, for example, the taxon specific crystallins. In these cases, the new function probably makes use of overall structural or physical features of the protein. A second method is illustrated by the case of PGI/AMF, an ancient enzyme that evolved a protein–protein interaction surface in addition to its active site pocket. The PGI active site pocket is a relatively small part of the protein structure, and there is a lot of solvent-exposed surface area that can undergo changes without adversely affecting catalysis. Because PGI is found in almost all species, it apparently first evolved over three billion years ago. A comparison of the rabbit and bacterial PGI structures showed that the active site has been conserved, but many surface features have changed during evolution. It is possible that the random accumulation of mutations on the surface might have resulted in an additional binding site that enables PGI to bind to a receptor.

I have a protein that is a moonlighting protein. How can I have it added to the MoonProt database?

Please contact us at [email protected] if you have another protein to suggest for including in MoonProt database.

I would like to use the MoonProt database in a project to analyze the amino acid sequences or structures using bioinformatics.

Please contact us at [email protected] if you are interested in using MoonProt database for analysis of sequences and/or structures of moonlighting proteins.

66


Home Proteins People FAQ's Articles Other References Publications on Moonlighting Proteins from the Jeffery Lab:

Proteins with neomorphic moonlighting functions in disease.
Jeffery CJ. IUBMB Life. 2011 Jul;63(7):489-94. doi: 10.1002/iub.504. PMID: 21698752

Moonlighting proteins--an update.
Jeffery CJ. Mol Biosyst. 2009 Apr;5(4):345-50. PMID: 19396370

68


Moonlighting Proteins: Proteins with Multiple Functions.
Jeffery, C. J. (2005) In The Extracellular Biology of Molecular Chaperones. Edited by Brian Henderson and A. Graham Pockley. Cambridge University Press. New York. Pp. 61-77.

Mass spectrometry and the search for moonlighting proteins.
Jeffery CJ. Mass Spectrom Rev. 2005 Nov-Dec;24(6):772-82. PMID: 15605385

Molecular mechanisms for multitasking: recent crystal structures of moonlighting proteins.
Jeffery CJ. Curr Opin Struct Biol. 2004 Dec;14(6):663-8. PMID: 15582389

Moonlighting proteins: old proteins learning new tricks.
Jeffery CJ. Trends Genet. 2003 Aug;19(8):415-7. PMID: 12902157

Multifunctional proteins: examples of gene sharing.
Jeffery CJ. Ann Med. 2003;35(1):28-35. PMID: 12693610

Crystal structure of rabbit phosphoglucose isomerase, a glycolytic enzyme that moonlights as neuroleukin, autocrine motility factor, and differentiation mediator.

69


Jeffery CJ, Bahnson BJ, Chien W, Ringe D, Petsko GA. Biochemistry. 2000 Feb 8;39(5):955-64. PMID: 10653639

Moonlighting proteins.
Jeffery CJ. Trends Biochem Sci. 1999 Jan;24(1):8-11. PMID: 10087914
900 S. Ashland Ave. (MC567), Chicago IL 60607 | ph: 312-996-3168, | Fax: 312-413-2691 | Copyright © 2008 The Board of Trustees of the University of Illinois Jeffery Lab University of Illinois at Chicago MBRB Rm. 4260

70


Other Reference Page Untitled Document Home Proteins People FAQ's Articles Other References

71


Publications from the Jeffery Lab about Moonlighting Proteins Jeffery, C. J. (2011) Neomorphic Moonlighting Functions in Disease. IUBMB Life (Special issue with the topic Moonlighting Proteins in Neurological Disorders). 63(7): 489-94.
*** Moonlighting protein

MOONLIGHTING PROTEINS DATABASE (MoonProt): A database … · 2019. 11. 23. · Moonlighting Protein...

Documents

Transcript of MOONLIGHTING PROTEINS DATABASE (MoonProt): A database … · 2019. 11. 23. · Moonlighting Protein...