Abstract

1
Abstract Accurate determination of the molecular weight (MW) of a protein is a necessity toward its isolation, purification and identification. Sodium Dodecyl Sulfate Polyacrylamide Gel Electrophoresis (SDS-PAGE) in one dimension with single percentage gels is traditionally used for that process. Gradient gels that incorporate a range of percentages have been considered less accurate, in part due to a lack of reliable mathematical models. The purpose of this project was to develop statistical models to accurately predict protein MW's on gradient gels. Six mathematical models were applied to protein standards of previously identified MW's to determine the best fitting model. Relative mobility (R m ) of the protein standards were calculated and compared to the actual MW's to make this determination. The "Cubic Model" was determined to be the best fitting and will be tested on unknown proteins suspected to play a role in amphibian fertilization. Question Which model provides the best fit for determining the known protein standards? R elative M obility vs.Log M olecularWeight y= -0.4775x+ 2.705 R 2 = 0.9644 y= 0.2922x 3 -4.1443x 2 + 18.975x-27.493 R 2 = 0.9977 y= -0.154x 2 + 0.9288x-0.474 R 2 = 0.985 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4 Log M olecularWeight R elative M obility Gel#2 VE & SE Trendline Log Linear Trendline Cubic Trendline Q uad. G EL ID Male Frog #1: 7.5% Female Frog #2: 7.5% Sea U rchin #3: 10% Sea Urchin #4:10% Male Frog #3:12% Female Frog #3: 12% Sea Urchin #1:12% Sea Urchin #2:12% Tris Superna tant#4 Sem in. Plasm a #5 Tris Pellet #6 Sem in. Plasm a #7 Jelly & Sem in. #1 Jelly & Sem in. #2 Jelly & Sem in. #3 R Square 0.9981 0.9990 0.9996 0.9999 0.9991 0.9995 0.9993 0.9988 0.9995 0.9990 0.9998 0.9986 0.9833 0.9871 0.9844 R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals 0.001 0.001 0.001 0.001 0.006 0.007 0.004 0.003 0.007 0.012 0.005 0.013 0.024 0.025 0.024 0.012 0.009 0.003 0.001 0.018 0.009 0.002 0.008 0.004 0.017 0.008 0.024 0.041 0.046 0.037 0.017 0.012 0.009 0.004 0.014 0.001 0.016 0.023 0.010 0.004 0.001 0.007 0.038 0.035 0.036 0.007 0.005 0.008 0.004 0.005 0.002 0.007 0.010 0.001 0.003 0.001 0.000 0.029 0.030 0.024 0.001 0.000 0.003 0.002 0.005 0.012 0.008 0.008 0.011 0.012 0.006 0.010 0.054 0.049 0.041 0.001 0.000 0.003 0.008 0.009 0.009 0.007 0.008 0.004 0.007 0.024 0.028 0.032 0.001 0.002 0.004 0.004 0.002 0.002 0.001 0.002 0.053 0.046 0.039 0.098 0.084 0.105 0.098 0.078 0.095 Residuals for Cubic Model log(MW) = a + b * R m + c * R m 2 + d * R m 3 G el2 VE:M W (D altons) G el2 Sperm Enzym e MW (D altons) G el4 VE & Tris S upernatan t:MW (D altons) G el6 VE & Tris P ellet: MW (D altons) Band 1 108622 113908 146583 139726 Band 2 76190 77864 77679 78369 Band 3 52517 54656 43360 43963 Band 4 43542 47431 37798 37990 Band 5 35003 36159 36319 Band 6 34117 34428 Final Predicted Weights of Unknown Proteins Using Cubic Model Conclusions The Cubic model was the best fitting of the 6 models that we tried to use on unknown molecular weights. This was determined by looking at the predicted weights, residuals, and R-square values of each of the models. Future Work will continue on gradient gels and some other possible models that could be used. Determinations 1.) The R-Squared is good for most of the models, except for the SLIC model which is a little low. R-squared is the ratio of predicted variation, (û i - ) 2 , to the total variation, (u i - ) 2 where û i is the predicted value of u i for a particular model and is the mean. The Cubic model produces the R-square average with the closest fit of the 6 different models. Ideally, R-square is equal to 1, meaning that the predicted variation and the total variation are equal. 2.) The predictions of the MW are good for most of the models but the Cubic shows a smaller amount of variation. 3.) The residuals of the models show the difference between the actual data points and the predicted points. Looking over the residuals (see example below) the Cubic model produces smaller residual values than the other 5 models. Actual Cubic -LN^2 Log-Log Quad. Log Linear SLIC Molecular Weights Predicted MW Predicted MW Predicted MW Predicted MW Predicted MW Predicted MW 200,000 201,028 197,751 197,683 183,306 164,210 144,849 116,250 115,949 118,022 117,919 120,246 117,416 107,140 97,400 94,775 96,991 97,280 101,049 102,609 97,197 66,200 67,689 70,111 69,995 72,876 78,955 81,441 45,000 45,519 44,271 44,248 44,679 50,028 57,331 31,000 31,647 28,822 28,736 28,090 29,506 36,594 21,500 20,933 20,514 20,518 21,197 17,963 17,906 14,400 12,277 13,618 13,438 12,789 12,820 11,494 6,500 8,007 10,267 10,635 9,374 10,472 9,211 R-Square Ave. 0.996 0.990 0.989 0.985 0.949 0.863 Procedure Measure standards in gels. • Test models on measured known protein standards. • Decide on best fitting model. • Receive and measure unknown proteins. • Begin analyzing unknowns and applying our model. Comparison of Mathematical Models to Determine Molecular Weight of Proteins: A Statistical Analysis 1 Jennifer Wright, 2 Edward J. Carroll, Jr., 1 Lawrence Clevenson Department of 1 Mathematics and 2 Biology California State University Northridge NASA/PAIR Program Models Tested Cubic Log(MW) = a + b * R m + c * R m 2 + d * R m 3 -LN 2 Log(MW) = a + b * ( -Ln(R m )) + c * ( -Ln(R m )) 2 Log-Log Log(MW) = a + b * Log(R m ) + c * Log(R m ) 2 Quad Log(MW) = a + b * R m + c * R m 2 Log Linear Log(MW) = a + b * (R m ) SLIC Log( Ln(MW)) = a + b * Ln( -Ln(R m )) R elative M obility vs. M olecularWeight 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3.8 4 4.2 4.4 4.6 4.8 5 5.2 MolecularWeight R elative M obility M ale Frog 7.5 Fem ale Frog #2:7.5% Sea Urchin #3: 10% Sea Urchin #4: 10% rep. Sea Urchin #1: 12% Sea Urchin #2: 12% rep. Jelly & Sem inal #1 Jelly & Sem inal #2 Fig. 1 Raw Data Graphs of raw data used in deciding best models Fig. 2 – Graph of relative mobility of raw data vs. log molecular weights starting with two 7.5% gels, two 10%, two 12% and two gradient gels. Fig. 3 Raw Standards Actual Molecular Weights vs. Predicted Molecular Weights Table 1: Comparison of the 6 models and the R-square values produced by each model. Table 2: Residuals and R-square values for the Cubic model. Comparison of 3 models with a Standard Fig. 4 One set of raw data (Gel #2 VE) is set against 3 of the models tested (Log Linear, Quad., Cubic). Fig. 5 Raw data Table 3: The Cubic model was applied to unknown proteins to predict their molecular weights. This work was supported by NASA CSUN/JPL PAIR. Many thanks go to: Carol Shubin, Virginia Latham, Larry Clevenson, Edward Carroll, Gregory Frye, Jennifer Rosales and John Handy.

description

Comparison of Mathematical Models to Determine Molecular Weight of Proteins: A Statistical Analysis 1 Jennifer Wright, 2 Edward J. Carroll, Jr., 1 Lawrence Clevenson Department of 1 Mathematics and 2 Biology California State University Northridge NASA/PAIR Program. - PowerPoint PPT Presentation

Transcript of Abstract

Page 1: Abstract

Abstract

Accurate determination of the molecular weight (MW) of a protein is a necessity toward its isolation, purification and identification. Sodium Dodecyl Sulfate Polyacrylamide Gel Electrophoresis (SDS-PAGE) in one dimension with single percentage gels is traditionally used for that process. Gradient gels that incorporate a range of percentages have been considered less accurate, in part due to a lack of reliable mathematical models.  The purpose of this project was to develop statistical models to accurately predict protein MW's on gradient gels. Six mathematical models were applied to protein standards of previously identified MW's to determine the best fitting model. Relative mobility (Rm) of the protein standards were calculated and compared to the actual MW's to make this determination. The "Cubic Model" was determined to be the best fitting and will be tested on unknown proteins suspected to play a role in amphibian fertilization.

Question

Which model provides the best fit for determining the known protein standards?

Relative Mobility vs. Log Molecular Weight

y = -0.4775x + 2.705

R2 = 0.9644

y = 0.2922x3 - 4.1443x2 + 18.975x - 27.493

R2 = 0.9977

y = -0.154x2 + 0.9288x - 0.474

R2 = 0.985

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4

Log Molecular Weight

Rel

ativ

e M

ob

ility

Gel#2 VE& SE

TrendlineLog Linear

TrendlineCubic

TrendlineQuad.

GEL ID

Male Frog #1:

7.5%

Female Frog #2:

7.5%

Sea Urchin #3:

10%

Sea Urchin #4: 10%

Male Frog

#3: 12%

Female Frog #3:

12%

Sea Urchin #1: 12%

Sea Urchin #2: 12%

Tris Supernatant #4

Semin. Plasma

#5

Tris Pellet

#6

Semin. Plasma

#7

Jelly & Semin.

#1

Jelly & Semin.

#2

Jelly & Semin.

#3

R Square 0.9981 0.9990 0.9996 0.9999 0.9991 0.9995 0.9993 0.9988 0.9995 0.9990 0.9998 0.9986 0.9833 0.9871 0.9844

Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals0.001 0.001 0.001 0.001 0.006 0.007 0.004 0.003 0.007 0.012 0.005 0.013 0.024 0.025 0.0240.012 0.009 0.003 0.001 0.018 0.009 0.002 0.008 0.004 0.017 0.008 0.024 0.041 0.046 0.0370.017 0.012 0.009 0.004 0.014 0.001 0.016 0.023 0.010 0.004 0.001 0.007 0.038 0.035 0.0360.007 0.005 0.008 0.004 0.005 0.002 0.007 0.010 0.001 0.003 0.001 0.000 0.029 0.030 0.0240.001 0.000 0.003 0.002 0.005 0.012 0.008 0.008 0.011 0.012 0.006 0.010 0.054 0.049 0.041

0.001 0.000 0.003 0.008 0.009 0.009 0.007 0.008 0.004 0.007 0.024 0.028 0.0320.001 0.002 0.004 0.004 0.002 0.002 0.001 0.002 0.053 0.046 0.039

0.098 0.084 0.1050.098 0.078 0.095

Residuals for Cubic Model log(MW) = a + b * Rm + c * Rm

2 + d * Rm3

Gel 2 VE:MW

(Daltons)

Gel 2 Sperm

Enzyme MW

(Daltons)

Gel 4 VE & Tris

Supernatant: MW

(Daltons)

Gel 6 VE & Tris Pellet:

MW (Daltons)

Band 1 108622 113908 146583 139726Band 2 76190 77864 77679 78369Band 3 52517 54656 43360 43963Band 4 43542 47431 37798 37990Band 5 35003 36159 36319Band 6 34117 34428

Final Predicted Weights of Unknown Proteins Using Cubic Model

ConclusionsThe Cubic model was the best fitting of the 6 models that we tried to use on unknown molecular weights. This was determined by looking at the predicted weights, residuals, and R-square values of each of the models.

FutureWork will continue on gradient gels and some other possible models that could be used.

Determinations

1.) The R-Squared is good for most of the models, except for the SLIC model which is a little low. R-squared is the ratio of predicted variation, (ûi - )2, to the total variation, (ui - )2 where ûi is the predicted value of ui for a particular model and is the mean. The Cubic model produces the R-square average with the closest fit of the 6 different models. Ideally, R-square is equal to 1, meaning that the predicted variation and the total variation are equal. 2.) The predictions of the MW are good for most of the models but the Cubic shows a smaller amount of variation.

3.) The residuals of the models show the difference between the actual data points and the predicted points. Looking over the residuals (see example below) the Cubic model produces smaller residual values than the other 5 models.

Actual Cubic -LN^2 Log-Log Quad. Log Linear SLIC

Molecular Weights Predicted MW Predicted MW Predicted MW Predicted MW Predicted MW Predicted MW

200,000 201,028 197,751 197,683 183,306 164,210 144,849

116,250 115,949 118,022 117,919 120,246 117,416 107,140

97,400 94,775 96,991 97,280 101,049 102,609 97,197

66,200 67,689 70,111 69,995 72,876 78,955 81,441

45,000 45,519 44,271 44,248 44,679 50,028 57,331

31,000 31,647 28,822 28,736 28,090 29,506 36,594

21,500 20,933 20,514 20,518 21,197 17,963 17,906

14,400 12,277 13,618 13,438 12,789 12,820 11,494

6,500 8,007 10,267 10,635 9,374 10,472 9,211

R-Square Ave. 0.996 0.990 0.989 0.985 0.949 0.863

Procedure

• Measure standards in gels.• Test models on measured known protein standards.• Decide on best fitting model.• Receive and measure unknown proteins.• Begin analyzing unknowns and applying our model.

Comparison of Mathematical Models to Determine Molecular Weight of Proteins: A Statistical Analysis1Jennifer Wright,

2Edward J. Carroll, Jr.,

1Lawrence Clevenson

Department of 1Mathematics and 2BiologyCalifornia State University Northridge

NASA/PAIR Program

Models TestedCubic Log(MW) = a + b * Rm + c * Rm

2 + d * Rm3

-LN2 Log(MW) = a + b * ( -Ln(Rm)) + c * ( -Ln(Rm))2

Log-Log Log(MW) = a + b * Log(Rm) + c * Log(Rm)2

Quad Log(MW) = a + b * Rm + c * Rm2

Log Linear Log(MW) = a + b * (Rm)SLIC Log( Ln(MW)) = a + b * Ln( -Ln(Rm))

Relative Mobility vs. Molecular Weight

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

3.8 4 4.2 4.4 4.6 4.8 5 5.2

Molecular Weight

Re

lati

ve

Mo

bili

ty

Male Frog 7.5

Female Frog#2: 7.5%

Sea Urchin #3:10%

Sea Urchin #4:10% rep.

Sea Urchin #1:12%

Sea Urchin #2:12% rep.

Jelly & Seminal#1

Jelly & Seminal#2

Fig. 1 Raw Data

Graphs of raw data used in deciding best models

Fig. 2 – Graph of relative mobility of raw data vs. log molecular weights starting with two 7.5% gels, two 10%, two 12% and two gradient gels.

Fig. 3

Raw Standards

Actual Molecular Weights vs. Predicted Molecular Weights

Table 1: Comparison of the 6 models and the R-square values produced by each model.

Table 2: Residuals and R-square values for the Cubic model.

Comparison of 3 models with a Standard

Fig. 4 One set of raw data (Gel #2 VE) is set against 3 of the models tested (Log Linear, Quad., Cubic).

Fig. 5 Raw data

Table 3: The Cubic model was applied to unknown proteins to predict their molecular weights.

This work was supported by NASA CSUN/JPL PAIR.

Many thanks go to:Carol Shubin, Virginia Latham, Larry Clevenson, Edward Carroll, Gregory Frye, Jennifer Rosales and John Handy.