Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property...

41
Workflo w

Transcript of Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property...

Page 1: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

Workflow

Page 2: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity

relationships (QSPRs/QSARs)

The representation of the molecular structure that is used in the CORALSEA is SMILES

= simplified molecular input-line entry system

For details, please see http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

Page 3: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

Here we used for the demo of CORALSEA our model from article “THE DEFINITION OF THE MOLECULAR STRUCTURE FOR POTENTIAL ANTI-MALARIA AGENTS BY THE MONTE CARLO METHOD” Struct. Chem. 2013; 24:1369–1381

You can develop a better model , but now please follow our suggestions.

Page 4: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

The first action is the preparation of SMILES file which is the input for CORALSEA

+1 COc1ccc2c(c1)NC(C)=C(CCCCCCC)C2=O 7.332+2 COc1ccc2c(c1)NC(C)=CC2=O 4.903+3 O=C1c2ccccc2NC(C)=C1CCCCCCC 6.979+4 O=C1c2ccccc2NC(C)=C1CCCCCCCCC 7.400#5 O=C1c3ccccc3NC(C)=C1C2CCCCC2 5.652-6 O=C1c3ccccc3NC(C)=C1c2ccccc2 6.270+7 O=C2c3ccccc3NC(C)=C2Cc1ccccc1 5.207+8 O=C1c2ccccc2NC(C)=C1Br 7.110-9 O=C1c2ccccc2NC(C)=C1\C=C\CCCCCCC 7.824+10 C=C(CCCCCCC)C=1C(=O)c2ccccc2NC=1C 7.472+12 O=C2c3ccccc3NC(C)=C2/C=C/c1ccccc1 5.827+13 COc1ccc2NC(C)=C(Br)C(=O)c2c1 5.934-14 Cc1ccc2NC(C)=C(Br)C(=O)c2c1 6.583#15 Brc1ccc2NC(C)=C(Br)C(=O)c2c1 6.470+17 Fc1ccc2NC(C)=C(Br)C(=O)c2c1 6.903+18 Clc1ccc2NC(C)=C(C#CCCCC)C(=O)c2c1 4.336#19 COc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.675-21 COc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 5.859-22 COc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.295-23 COc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 6.570+24 COc3cccc1c3NC(C)=C(C1=O)c2ccccc2 5.779-25 Clc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.279#26 Clc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 5.485#28 Clc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.324-29 Clc1ccc2NC(C)=C(C(=O)c2c1)c3ccccc3 6.110-30 Clc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 5.731-31 Clc1ccc2NC(C)=C(C(=O)c2c1Cl)c3ccccc3 5.493#33 Clc1cc2NC(C)=C(C(=O)c2c(Cl)c1)c3ccccc3 5.464#34 COc1ccc3c(c1)C(=O)C(Cc2ccccc2)=C(C)N3C 5.094+35 COc1ccc3c(c1)N(C)C(C)=C(Cc2ccccc2)C3=O 5.106+36 Fc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.081+37 Clc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.815+38 Brc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.602#39 Fc1cc2c(cc1OC)NC(C)=C(CC)C2=O 6.793+41 Brc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.440-44 Clc1cc2c(cc1OC)NC(C)=C(C2=O)C3CCCCC3 6.401+45 Clc1cc3c(cc1OC)NC(C)=C(Cc2ccccc2)C3=O 7.164-46 Clc1cc2c(cc1OC)NC(C)=C(C)C2=O 7.564#47 CC(C)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 6.712+48 CC(CC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.199+49 Clc1cc2c(cc1OC)NC(C)=CC2=O 5.731-50 Clc1cc2c(cc1OC)NC(C)=C(C#CCCCC)C2=O 5.376#53 CC(C)(C)OC(=O)/C=C/C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.271

Each compound should be represented by (1) The type=[+,-,#]; (2) The ID: it can be CAS (chemical abstract service) or a number;(3) SMILES; and (4) Endpoint value.

“+” is indicator of sub-training set;“-” is indicator of calibration set;“#” is indicator of test set.

The role of sub-training set is developer of model; The role of calibration set is critic of model; The role of test set is estimator of model.

MyFile.txt

Page 5: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It is a good idea to reserve some substances as "invisible" validation set for final estimation of the model

10

*11 O=C1c2ccccc2NC(C)=C1C\C=C\CCCCCC 6.728

*16 Clc1ccc2NC(C)=C(Br)C(=O)c2c1 6.900

*20 COc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 4.624

*27 Clc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 4.805

*32 Clc1cc2c(cc1Cl)NC(C)=C(C2=O)c3ccccc3 6.456

*40 Clc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.559

*42 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCC)C2=O 8.530

*43 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCCCC)C2=O 8.779

*51 C=C(CCCCC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.830

*52 Clc1cc2c(cc1OC)NC(C)=C(\C=C\CCCCC)C2=O 7.975

Format of file for this validation is the following:

(1)The number of compounds; (2) list of compounds in the above-mentioned format type-ID-SMILES-Endpoint values.

MyInput.txt

Page 6: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

In order to start your work you must download CORALSEA.zip from www.insilico.eu/coral When it is done, you must insert folder "CORALSEA" in your computer:

Page 7: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

…and insert your data (i.e. “MyTRNCLBTST.txt”) in folder “MyCORALSEA”:

Page 8: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

Containing of MyCORALSEA is the following:

Page 9: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

In order to carry out QSPR/QSAR analysis of data represented for CLASSIFICATION MODEL one should do the following:

(i) Insert “#TRNCLBTST-1.txt” in

the folder;

(ii)Insert “#Input-1.txt” in the folder.

(iii)Click CORALSEA.exe. “#TRNCLBTST.txt-is file which contains training (TRN), calibration(CLB) ,and test(TST) sets#Input.txt is data which are not visible during building up model

Page 10: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

Click Button “Load method”…

Page 11: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

Insert name “#TRNCLBTST-1.txt” in text box

1

3

2

Page 12: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

Click “ SAVE SYSTEM”

Page 13: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

Restart program and Click “Load system”

Page 14: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

Click “OK”

Page 15: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

This plot relates to the external “invisible” validation set

Page 16: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

File “#Output-1.txt contains statistical characteristics for the validation set (#Output-1.txt is placed in folder “Model”)

Page 17: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

In order to carry out QSPR/QSAR analysis of data represented for REGRESSION MODEL one should do the following:

(i) Insert “#TRNCLBTST.txt” in the

folder;

(ii)Insert “#Input-1.txt” in the folder.

(iii)Click CORALSEA.exe.

“#TRNCLBTST.txt-is file which contains training (TRN), calibration(CLB) ,and test(TST) sets#Input.txt is data which are not visible during building up model

Page 18: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

Insert name “#TRNCLBTST-1.txt” in text box. After this, please select “Classic Scheme” or “Balance of Correlation” for your QSPR/QSAR investigation

SELECT

INSERT

Page 19: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

Two actions: (1) define Method and (2)Save method

1

2

Page 20: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

You can involve graph invariants in addition to SMILES attributes

1

2

Page 21: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

You can use “classic scheme”, balance of correlations, and Ideal slopes C1,C1’

Page 22: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

You can choice your mode e.g. (1) Define Dstart=0.25 ; (2) Nepoch=20; after this you must do(3) Click “Save method”, otherwise method remains the same

1

1

2

3

Page 23: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

Click “Search for preferable model (T*,N*)”

Page 24: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

Programm will carry out the Monte Carlo optimization with various threshold and the number of epochs. The preferable values of threshold and the number of epochs one can find in file “Search/BestMDL.txt” when the calculation will be completed.

Page 25: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

The containing of file “search/BestMDL.txt” will be approximately the following:

One can see that preferable threshold (T*) is 2, and the preferable number of epochs (N*) is 15.One can use this information to build up robust model.

Page 26: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

An attempt to build up robust model…

Create Folder “MyCORALSEA-T2-N15” (copy of “MyCORALSEA”)

Run CORALSEA.exe in this folder “MyCORALSEA-T2-N15”

Click “Load method”

Page 27: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

(1) Insert Nepoch=15, (2) Click “Building up preferable model (T*,N*)”

T*=2N*=15

(3)Insert Threshold=2, and (4) Click “Continue”

1

2

3

4

Page 28: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

Click “Yes”

Page 29: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

Gradually the program will be calculating the model :

Page 30: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

When the model will be ready the screen will be the following :

Click “Save system”

Page 31: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

Folder “Model” contains parameters of the QSPR/QSAR model

File “#Output-1.txt contains statistics for the invisible validation set

Page 32: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

When the model will be ready the screen will be the following :

Click “Load system”

Page 33: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It will appear at the screen

(1) Insert name “MyInput.txt” instead of “#Input-1.txt”

(2) Click “Start of DCW and Endpoint calculation for SMILES input file”

2 MyInput.txt1

Page 34: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It will appear at the screen

After these actions, file “model/Output.txt” will contain results of calculation for compounds from “MyInput.txt”

Click “OK”

Page 35: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It will appear at the screen

You will see graphical representation for sub-training, calibration, test, and validation sets.

Page 36: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

The containing of the “model/Output.txt” will be the following:

Last, but not least…

Page 37: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

One can calculate model for individual SMILES

(1) Insert SMILES in indicated box;(2) Click “Start of DCW and Endpoint Calculation for Inserted SMILES”

1

2

Page 38: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

It appears in your screen:

See file “Model/DemoDesc.txt”

Page 39: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

The Containing of “Model/DemoDesc.txt” is the following:

DCW is DCW(2,15) for NC(CCCNC(N)=N)C(O)=O; Endpoint=2.9412.This example is only demo, the NC(CCCNC(N)=N)C(O)=O is apparently out

of Domain of applicability.

Page 40: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

These slides have shown the "technology", but to understand "philosophy", please read file

"ReadMe.pdf"

Page 41: Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

Some definitions