Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually,...

14
Project No. SB16-429.001 SMART Analysis of XPHARMX Investigational Compounds (FTC1, FTC2) Final Report Submitted to Dr. Shane XXX May 2016 AVMBioMed, LLC 33 W. King St. #181 Malvern, PA 19355 Contacts: Christian Loch, M.P.H, Ph.D [email protected] Phone: (610) 846.286.2466 x707 Fax: (610) 846.286.2466 Director of Research & Development Maura Buckley [email protected] Phone: (610) 846.286.2466 x700 Fax: (610) 846.286.2466 Business Manager AVMBioMed, LLC 33 W. King Street #181, Malvern, PA 19355 www.avmbiomed.com Phone & Fax: 844.286.2466 [email protected] 1

Transcript of Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually,...

Page 1: Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue

Project No. SB16-429.001

SMART Analysis of XPHARMX Investigational Compounds (FTC1, FTC2)

Final Report Submitted to Dr. Shane XXX May 2016

AVMBioMed, LLC33 W. King St. #181Malvern, PA 19355

Contacts:

Christian Loch, M.P.H, [email protected]: (610) 846.286.2466 x707Fax: (610) 846.286.2466Director of Research & Development

Maura [email protected]: (610) 846.286.2466 x700Fax: (610) 846.286.2466Business Manager

Names have been obscured to protect the scientific integrity and property (i.e. information) of the client. Copyright © 2017 AVMBioMed, LLC

AVMBioMed, LLC • 33 W. King Street #181, Malvern, PA 19355 • www.avmbiomed.comPhone & Fax: 844.286.2466 • [email protected]

1

Page 2: Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue

1. SummaryIn order to better understand the mechanisms of action, and predict the safety and efficacy of Compound 1 and Compound 2 (hereafter FTC1 and FTC2), Snapshot Proteomics was employed for purposes of small molecule characterization (SMART). The effects of cellular ubiquitylation in response to drugs were examined across ~20,000 human proteins. Both compounds affected changes in the ubiquitylation of between 100 and 160 distinct but slightly overlapping proteins. Changes in ubiquitylation were observed in both directions, i.e. some proteins increased their ubiquitylation and others decreased it in response to drugs. Both drugs appear to be working primarily in the nucleus, affecting transcription possibly through the observed connections to chromatin, with effects on cell cycle and/or apoptosis.

2. Results

AVMBioMed, LLC • 33 W. King Street #181, Malvern, PA 19355 • www.avmbiomed.comPhone & Fax: 844.286.2466 • [email protected]

2

Page 3: Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue

In the current study, the cellular effects of two different compounds (FTC1 and FTC2) were revealed at two concentrations each (1M and 10M) by Snapshot Proteomics. In order to be classified as a real change (i.e. a “hit”), we required that changes to protein ubiquitylation in response to drug conform to the following rules: that changes to ubiquitylation were statistically significant at p<0.05; were observed at BOTH doses of compound; were in the same direction for both doses (increased or decreased); and that the change displayed dose-dependency or consistency. Consistency of observed ubiquitylation of a protein at both drug concentrations would indicate events for which the low dose was saturating; and was here defined as ubiquitylation change of a protein in low drug dose (1 M) being less than 2-fold

greater than that observed at the high drug concentration (10 M). All of these rules were employed to limit false discovery and provide the best sense of what each drug was doing in the cell. Under these assumptions, there were roughly 150 proteins whose ubiquitylation was altered by FTC1 and roughly 100 by FTC2. The distribution of data (Figure 1) shows that most of these changes were modest and approximated zero. While these changes could certainly

reflect real events (chain trimming or extension, for example), the shape of the data suggested certain inflection points (+1.0 and -0.5) where the magnitude of change deviated dramatically from the horizontal. These proteins (with Mvalue >1.0 and <-0.5) were taken to represent the highest-confidence hits for each compound, and analysed for ontologies (below). It is worth noting that both compounds increased ubiquitylation at some proteins and decreased ubiquitylation at others, as expected given cellular crosstalk and compensation mechanisms (see Discussion for more). However, both compounds predominantly affected an increase in cellular ubiquitylation, consistent with expectations of DUB inhibitors: FTC1 increased ubiquitylation of 30 proteins and decreased 26, while FTC2 revealed 27 up and 12 down. While we do not know whether both compounds target the same or different DUBs, we expected some commonality regardless and examined the data for changes common to both drugs. Table 1 shows the twelve proteins and corresponding Mvalues affected by both FTC1 and FTC2.

AVMBioMed, LLC • 33 W. King Street #181, Malvern, PA 19355 • www.avmbiomed.comPhone & Fax: 844.286.2466 • [email protected]

Figure 1. Distribution of data for A) FTC1 and B) FTC2 where blue dots represent the Mvalue of individual proteins observed at 10M and red dots the corresponding Mvalue at 1M treatment. Data was ordered from greatest increase in ubiquitylation to the greatest decrease using the 10M data.

Table 1. List of the twelve proteins whose ubiquitylation was affected by both FTC1 and FTC2. Note the consistency among the data for direction: six proteins were increased in ubiquitylation by both compounds, while six were decreased. (No proteins were affected by both compounds in opposing direction.)

3

Page 4: Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue

Gene Ontology (GO) clustering can be useful for filtering meaningful changes from false positives and providing insight into the biological roles of each compound. This process involves taking a specified set of data and asking whether there are any biological processes, molecular functions, or cellular locations over-represented in the data relative to what chance alone would predict. One major caveat, of course, is the incompleteness of the ontology database; information about those three categories is simply not known for all proteins (or poorly annotated sometimes when it is). Nevertheless, it can provide a useful tool to aid in the process of prioritization of hits for follow-up study. It is up to the individual investigator whether or not to assign higher priority to individuals comprising an ontology based on her or his own insight and intuition. Ontology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue <0.05 and Mvalue with magnitude greater than the above-stated inflection points (i.e., greater than +1.0 and less than -0.5) were compared against the goa_human database with Benjamini correction for multiple testing (Beissbarth and Speed). Several ontologies were revealed in each case, and are presented below (Tables 2-4). Where possible, members of each ontology were listed. Complete lists of proteins comprising the larger (n>10) ontologies can be provided upon request.

Table 2. Ontologies within FTC1 data

GO is the unique identifier assigned by the Ontology Consortium for each category, listed as Description. Found is the number of proteins present in the list of hits, while annotated is the total number in that ontology according to the consortium. pValue is the chance that the ontology was present in this data by chance alone. Members (where few enough to list) are the exact proteins found within the dataset.

Table 3. Ontologies within FTC2 data

AVMBioMed, LLC • 33 W. King Street #181, Malvern, PA 19355 • www.avmbiomed.comPhone & Fax: 844.286.2466 • [email protected]

4

Page 5: Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue

Headers as described in Table 2.

Table 4. Ontologies within FTC1 and FTC2 datasets

Headers as described in Table 2.

AVMBioMed, LLC • 33 W. King Street #181, Malvern, PA 19355 • www.avmbiomed.comPhone & Fax: 844.286.2466 • [email protected]

5

Page 6: Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue

3. Discussion

Observing decreased protein ubiquitylation of certain substrates is not intuitive given cellular treatment with DUB inhibitors (e.g., preventing deubiquitylation would be expected to cause increased ubiquitylation). However, this cell-based assay is not reporting activity of compound against a single DUB, rather the cell’s complete response to it; in our experience, this response always includes compensatory adjustments and even changes within other forms of post translational modification (PTM). Thus, other DUB(s) were likely up-regulated in response to compound, which in turn resulted in decreased ubiquitylation of certain proteins. Similarly, we would expect to have found changes (increased and decreased) to protein phosphorylation (for example) had we also looked there.

“Chromatin” was identified as significantly altered in ubiquitylation in all four individual experiments (ID’s CDI-HTB22, CDI-CCL247 for FTC1_10uM and FTC2_10uM; CDI-HTB22 for FTC1_1uM; CDI-CCL240 and CDI-CCL247 for FTC2_1uM). These proteins were not included in the ontology analysis since they are not proper Gene ID’s recognizable by the database, although they were consistent with many that were identified (e.g. “histone modification” in FTC1). Based on this data, both compounds appear to both be working primarily in the nucleus to alter gene expression, likely through remodelling chromatin. FTC2 appears particularly oncology-focused given the cell-cycle and apoptotic ontologies there revealed. FTC2 increased protein ubiquitylation with minimal compensatory cellular decrease, whereas FTC1 appeared more balanced in this regard. Ontologies revealed were remarkably consistent, and did not reveal obvious toxicities or side-effects; however it is often within the phospho-proteome where such things are revealed.

The order of hits within a list is always less important than the determination of where to draw the cutoff in defining those hits. Therefore, the rank-order of proteins should be de-emphasized, with all proteins within a given list being considered equally likely to be real (statistically speaking). Any additional knowledge that can be brought to bear to shorten (or lengthen) lists or prioritize hits is, of course, highly encouraged.

In summary, Snapshot Proteomics of FTC1 and FTC2 appears to have identified distinct but slightly overlapping changes in cellular ubiquitylation affected by each, and a likely cellular profile involving strong alteration of transcription. Without further knowledge of these compounds, there is little we can do further to provide insight, but the cellular effects here revealed can be used to prioritize compounds, suggest companion therapy, and/or adjust the medicinal chemistry of either.

MethodsDr. Shane XXXX prepared and sent to AVMBioMed pellet of cultured human cells and aliquots of FTC1 and FTC2. Cells were lysed and clarified by AVMBioMed, aliquoted and then treated at room temperature (RT) with either 10uM FTC1, 1uM FTC1, 10uM FTC2, 1uM FTC2, or DMSO. After 60 minutes, treated lysates were incubated individually with five pre-blocked protein microarrays for 90 minutes RT.

Protein microarrays were removed from -20oC storage and placed at room temperature (RT) for 15 minutes before opening, to avoid formation of condensation. Arrays were then blocked for 1 hour at RT

AVMBioMed, LLC • 33 W. King Street #181, Malvern, PA 19355 • www.avmbiomed.comPhone & Fax: 844.286.2466 • [email protected]

6

Page 7: Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue

in PBS containing 0.05% Tween-20, 20mM reduced glutathione, 1mM DTT, 3% BSA, and 25% glycerol. Three PBS washes preceded 90 minutes (RT) incubation with samples, described above. Tandem Ubiquitin Binding Entities (TUBES; Boston Biochem) were then employed for detection of pan-ubiquitin chains across all five arrays. Three PBS washes preceded incubation with secondary detection reagent, Alexa647-conjugated streptavidin. A final wash of 2 changes PBST, 2 x PBS, then two changes water preceded centrifugal drying (1000 RPM for 5 minutes at RT) and scanning (GenePix 4100A by Molecular Devices) of the arrays.

Data Analysis

Microarray images were gridded and quantitated using GenePix Pro (v7) software. Median intensities (features and local backgrounds) were utilized, and signal to noise ratio (SNR) calculated. Values were then normalized to biological controls within each array. Duplicate features (representing identical protein) were summarized by average and standard deviation. These values were compared between arrays (compound-treated minus DMSO-treated control) then Loess transformed by print tip and location to remove technical sources of error (Smyth and Speed 2003), resulting in the final estimate of magnitude change (M-value). T-test (paired, 2 tailed) was used to assess the statistical significance (p-value) of each estimate (under the null hypothesis that M = 0). A threshold of 95% confidence (p < 0.05) was employed to filter data. Gene Ontology (GO) clustering was performed (Beissbarth and Speed 2004) to identify categories (biological processes, cellular components, or molecular functions) over-represented within this data set relative to what chance-alone would predict. N.b., M-value is a twice normalized (biologically and for technical sources of error) difference between mean signal-to-noise ratios generated from relative fluorescence units (RFU); as such it has no units, and can be simply reported or graphed as “M-value”. “RFU” could also be used.

ReferencesSmyth, G. K. and T. Speed (2003). "Normalization of cDNA microarray data." Methods 31(4): 265-273.

Beissbarth, T. and T.P. Speed (2004). “GOstat: Find statistically overrepresented Gene Ontologies within a group of genes.” Bioinformatics 20(9):1464-1465.

Explanation of column headers:Block, row, column = the physical location of the protein listed; block, column, and row of the microarray (there are 48 blocks left to right top to bottom, each with 32 columns and 31 rows). Name = Protein nameID = accession numberMvalue = duplicate-summarized, loess normalized, signal to noise ratio from the protein, values obtained from the experimental array minus those obtained from the negative control array.Stdev = standard deviation of the difference above (standard deviation of two numbers subtracted is the square root of the sum of squares of the standard deviations associated with each of the two numbers) Note that although calculation of standard deviation technically requires three values, Excel will finesse a

AVMBioMed, LLC • 33 W. King Street #181, Malvern, PA 19355 • www.avmbiomed.comPhone & Fax: 844.286.2466 • [email protected]

7

Page 8: Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue

value from just two and report it, which is what we are using here (since each protein was only printed in duplicate for spatial limitations)Pval = p value from 2-tailed, paired T-test for significance of change (with cutoff set at p<0.05).

Files attached:1. Significant (p<0.05) data for each of the four individual studies.

2. By compound, significant (p<0.05) changes at both doses also exhibiting dose-response (as defined above).

3. All data on >20,000 human proteins collected for each of the four individual studies, especially useful to examining what occurred at any proteins of particular interest beyond just the significant lists

FAQ

Q1. If positive Mvalues indicate biologically meaningful events (interaction with my POI, or substrates of my POI), do negative Mvalues imply the impossible (e.g., higher interaction with my POI in a sample that did not even contain my POI)? A1. Negative M-values do indicate that the signal observed at those arrayed proteins was weaker on the case array than on the control. It is (obviously) difficult to conjure any biologically meaningful explanation for the phenomenon, so these signals likely represent noise on the arrays (perhaps there was a scratch on one array, or insoluble material settled on an array at that location etc. etc.). The first thing to consider is that Loess transformation performed in this study is statistically conservative in this regard. It is based on the assumption that true hits should be geographically scattered at random throughout the array, as opposed to showing bias for certain locations. To achieve this effect, it forces values within each of the 48 blocks present on the array to obtain an average signal of zero, and a standard deviation of one. In practice, when there are plenty of good (and positive) signal differences between arrays, it has the effect of exaggerating some of the lesser, or slightly negative differences, making them appear "more negative" than they in all actuality were. The second consideration is that in any screen of 20,000 questions and a cutoff of p<0.05, we expect to see 1000 changes due to chance alone (by definition). In practice, this number has been shown to be more commonly about 400, indicating good reproducibility of the screening method.

Q2. Does a lower Mvalue indicate a weaker interaction (protein-protein or enzyme-substrate)?A2. It could certainly be the case that the larger Mvalue represented a stronger interaction, or a biologically real interaction and the small one a spurious interaction. However, it could also mean that both interactions were biologically real, but the large one a structural and stable interaction while the small one represented an enzymatic and transient one (protein-protein) or a biologically critical but rare one (enzyme-substrate). Finally, it is also possible that both interactions were biologically meaningful, stable, and interesting, but the one represented by large Mvalue was for a protein on the array that was easily made recombinantly (expressed at 20M and purified to 99% homogeneity, for example) while the small one involved a protein that was difficult to make (expressed to 10nM and purified to 50% homogeneity, for example).

AVMBioMed, LLC • 33 W. King Street #181, Malvern, PA 19355 • www.avmbiomed.comPhone & Fax: 844.286.2466 • [email protected]

8

Page 9: Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue

Q3. So what do I believe, Mvalue or Pvalue?A3. Neither one alone, probably. Mvalue captures the "Magnitude" of change between the two arrays, and pValue captures the likelihood that that change (however small or large) was due to chance alone (p<0.05 was used here, which means that there is a 5% or less chance that any of these changes were due to chance alone). If two spots on the case array had values of 1.001 and 1.002, and the corresponding values on the control array were 0.975 and 0.974, a reasonable interpretation would be that two slightly different estimates of the same number (1.0) were reported on both arrays (as captured by the simplified and untransformed Mvalue of 0.027, essentially zero). Given the hyper-reproducible intra-array estimates (stdev of 0.0007 in each case) however, we would see a pValue = 0.027, suggesting something real (less than 3% chance that the change was due to chance alone). On the other hand, if we had values of 100 and 20 on the case array, and values of 3 and 9 on the control, a reasonable interpretation would be that the change was meaningful and real (hypothetical and simplified Mvalue of 54), while the poor reproducibility of intra-array signals on each array (stdev 56.6 and 4.24, respectively) would drive our pValue to 0.43 in this case (a 43% chance that this difference was due to chance alone). The bottom line is that either estimate alone can easily mislead, and both should be used together. In practice, criteria are chosen arbitrarily for each (there is nothing “real” about p<0.05, and a protein with p=0.051 is probably just as “real” as one with p=0.049). In all cases, p<0.05 is utilized per common convention, while Mvalue is individually chosen based on the graphical distribution of hits unique to each project (some proteins simply interact with a LOT more than others, rendering any a-priori assumption difficult to justify). To look at either criterion alone (for example, to capture or “cherry-pick” events like the second hypothetical described above), simply start with the spreadsheet appended with “all” that was provided, and sort by either column (P or M). For purposes of drawing a cutoff of “real” hits, a client may feel free to tighten or loosen the Mvalue chosen by AVMBioMed. Whatever the pValue and Mvalue chosen as cutoff criteria, all hits above that line should be considered equally likely to be real (as borne out by statistical calculation and previous study using independent datasets). The order of hits above cutoff is ALWAYS less important to consider than where that cutoff is drawn.

Q4. This screen failed to identify a positive control protein for my POI (that was indeed present on the array). What gives?A4. No screen is perfect -certainly we could have just missed it. Perhaps the pH was slightly off here, or the local concentration of a metabolic cofactor is higher within the cell, etc. Alternatively, we may have missed it given the conservative assumptions with which we analyze the results. Our screens are tailored towards identification of novel interactions with the goal of minimizing false discovery (to maximize your own resources spent on follow-up). As such, we sometimes find that known interactions are not captured. Usually, when we look at such individual proteins, we find data that is indeed consistent with interaction, but simply missed our cutoff criteria for one reason or another (see the 2nd scenario in answer #3 for an example). While initially disappointing, such phenomenon generally increases confidence in the hits that did reach criteria and were reported.

Q5. There is a lot of discussion about “real” hits. You admit that your screen returned some garbage to me?Q5. Yes, we do. There is ultimately no perfect way to order hits or zoom in on EXACTLY which events were biologically real, as is simply the nature of any screen of any size. At p<0.05 in a screen of 20,000 yes/no questions (i.e., “interaction with my POI yes or no?”), we expect to see- by definition of pValue-

AVMBioMed, LLC • 33 W. King Street #181, Malvern, PA 19355 • www.avmbiomed.comPhone & Fax: 844.286.2466 • [email protected]

9

Page 10: Project No. SB16-429.001 · Web viewOntology analysis was undertaken for each dataset individually, employing a common cutoff in which any putative hit with pValue

1000 “yes” answers by chance alone. In practice, the number associated with Snapshot Protemics is usually about 400, indicating high reproducibility for this method. Furthermore, many or most of these are easily identified and removed from further consideration (for example, the slight but reproducible changes resulting in Mvalues of what is essentially zero, as described by the first scenario in answer #3).

AVMBioMed, LLC • 33 W. King Street #181, Malvern, PA 19355 • www.avmbiomed.comPhone & Fax: 844.286.2466 • [email protected]

10