SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In...

50
SYLVIA Estimation of the Synthetic Accessibility of Organic Compounds Version 1.4 Program Manual and Description Molecular Networks GmbH April 2016 www.mn-am.com

Transcript of SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In...

Page 1: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA

Estimation of the Synthetic Accessibility

of Organic Compounds

Version 1.4

Program Manual and Description

Molecular Networks GmbH April 2016

www.mn-am.com

Page 2: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration
Page 3: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Molecular Networks GmbH

Neumeyerstr. 28

90411 Nuremberg

Germany

Altamira LLC

1455 Candlewood Drive

Columbus, OH 43235-1623

USA

mn-am.com

This document is copyright © 2008-2018 by Molecular Networks GmbH Computerchemie and Altamira LLC. All rights reserved. Except as permitted under the terms of the Software Licensing Agreement of Molecular Networks GmbH Computerchemie or Altamira LLC, no part of this publication may be reproduced or distributed in any form or by any means or stored in a database retrieval system without the prior written permission of Molecular Networks GmbH Computerchemie or Altamira LLC.

The software described in this document is furnished under a license and this document may be used and copied only in accordance with the terms of such license. (Doc version: 1.0-2016-04-25)

Page 4: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration
Page 5: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Contents

Contents

1 Introducing SYLVIA 1

2 SYLVIA GUI Version 2

2.1 The Main Window 4

2.2 The Main Menus 4

2.3 The Wizard 8

2.4 Synthetic Accessibility Settings 11

2.5 Database Management 14

2.6 Total Synthetic Accessibility Score and Components Contributions 20

3 SYLVIA Batch Version and Associated Tools 21

3.1 SYLVIA Batch Version 21

3.2 extractSM 23

3.3 extractRCSS 25

4 Understanding SYLVIA – The Scientific Method 27

4.1 Overview 27

4.2 Starting Material Similarity Score 28

4.3 Reaction Center Substructure Score 33

4.4 Generation of Product Reaction Center Substructure Database 33

5 Technical Requirements 36

5.1 System Requirements 36

5.2 Program Scope and Known Limitations 36

6 Program Installation 37

6.1 Download from the Web Server of Molecular Networks 37

6.2 New Installation 37

6.3 Program Updates 40

7 Problems and Help! 41

8 Acknowledgements 42

9 References 43

10 Report Form 44

Page 6: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration
Page 7: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Introducing SYLVIA

1

1 Introducing SYLVIA

SYLVIA is a program to estimate the synthetic accessibility (or the ease of synthesis) of

organic compounds. SYLVIA is available in the following three different versions.

Graphical user interface application (GUI) for interactive usage

Batch version for automatic batch processing of large files

Daemon version for running calculations as background processes

All four versions perform the same calculation and give the same results. This

document is the manual for the GUI and batch version. The estimation of synthetic

accessibility provides a number between 1 for compounds that are very easy to

synthesize and a number of 10 for compounds that are very difficult to synthesize. The

method for calculating synthetic accessibility takes account of a variety of criteria such

as complexity of the molecular structure, complexity of the ring system, number of

stereo centers, similarity to commercially available compounds, and, potential for using

powerful synthesis reactions. These criteria have been individually weighted to provide

a single value for synthetic accessibility. Results of a survey of several medicinal

chemists have been considered in this weighting process. Major points of the

calculation method have been published in the Journal of Computer-Aided Molecular

Design and are briefly summarized in this manual.

Low synthetic accessibility scores, i.e., from compounds easier to synthesize than a

given numerical threshold, are shown with a green background, medium scores yield a

yellow background and those very difficult to synthesize obtain a red background.

However, these thresholds can be configured.

Page 8: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Release Notes

2

2 Release Notes

2.1 Version 1.0, October 2007

Version 1.0 is the first commercial release of SYLVIA (based on the software

technology platform MOSES, MOlecular Structure Encoding System, see

www.molecular-networks.com/moses, [1]).

Version 1.0 includes the GUI, batch and daemon version. It reads in SD and SMILES

files and displays and outputs the synthetic accessibility scores in the GUI version, as

SD data fields in SD output files or as system-wide available values of the daemon

version.

2.2 Version 1.2, December 2012

Besides several improvements and extensions, version 1.2 offers the output of the

individual structure- and reaction-based scores which contribute to the total synthetic

accessibility score, that are the following components.

Molecular graph complexity

Ring complexity

Stereochemical complexity

Starting material similarity

Reaction center substructure similarity

2.3 Version 1.4, November 2014

The following specific improvements, changes and new features have been

implemented into version 1.4.

New Database Manager in GUI version to create and administrate customized

starting material (SM-DB) and reaction center substructure databases (RCSS-

DB)

o Creation of new customized databases and extension of existing

databases

o Access to all options available

o Import of customized databases

o Renaming and deletion of databases

o Export of databases

o Information about content in databases

Improved command line tool "extractSM" in XT version to generate customized

SM-DBs

Page 9: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Release Notes

3

New command line tool "extractRCSS" in XT version to generate customized

RCSS-DBs for reaction fitness scores

Improved statistics about generation of RCSS-DBs and SM-DBs displayed after

generation of databases with Database Manager in GUI version or with

command line tools "extractSM" and "extractRCSS" in XT version

Improved configuration side bar in GUI version with field to change applied SM-

DB and RCSS-DB while compounds are loaded

In addition, the following general improvements have been implemented.

Zero-based counting of record numbers in input files changed to one-based

Automatic storing of all user settings of a session (GUI version)

Changes of connection table information in SMILES codes and SD records from

input to output file fixed, if input and output file format are identical

Improved handling of failed structures in GUI and XT version

Page 10: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

4

3 SYLVIA GUI Version

3.1 The Main Window

In this section, the main program window is described. Help on the individual

configuration options can be found by pressing the "Help" button of the configuration

dialog or in the navigation tab on the left hand side of this window.

When the program is started, it will usually open up a wizard that queries the wishes of

the user. However, if the name of an input file is provided as a command line

argument, this file will be loaded and no wizard will be shown.

Figure 1 The main window of SYLVIA.

3.2 The Main Menus

In the following section the main menus (main menu bar) of SYLVIA are described see

Figure 2).

Page 11: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

5

Figure 2 The main menu bar of SYLVIA with menus File, View, Data, Extra

and Help.

3.2.1 The File Menu

The file menu provides a number of typical options for accessing data in files.

Command Description

New With the "New" command, the list of already loaded compounds is

cleared.

Start

Wizard...

This command restarts the Wizard, even if it has been deactivated in the

last page of the wizard. If applicable, the program will ask whether new

structures should be appended to the list of already existing (or loaded)

molecules.

Input

Structure...

Opens up a molecule editor for input of a new molecule. While the editor

runs, the application is otherwise blocked. If applicable, the program will

ask whether the new structure should be appended to the list of already

existing (or loaded) molecules.

Open... Open will open and load new structure files. The program will ask

whether these new structures should be appended to the list of already

existing (or loaded) molecules, if there are any.

Save The "Save" command will save the current view (structures as sorted

and filtered) together with the properties that are present in the input

files and those shown on the screen, i.e., the properties that are

selected for viewing.

Page 12: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

6

Command Description

Save As... Saves the structures as with the "Save" command but provides the

possibility to select a new file name first.

Exit The "Exit" command will close the program.

3.2.2 The View Menu

Using the View menu, the visual appearance of the structures can be changed.

Command Description

Structure Size With "Structure Size" the size of the structures in the table

can be changed. This value is retained in subsequent

program uses.

View First Row Jumps to the very first record in the file.

View Page Up Goes up one page.

View Page Down Goes down one page.

View Last Row Jumps to the last record of the file. If not yet done, the file will

then be completely scanned. Thus, for large files, delays may

be noticeable.

Goto... Allows the specification of a record number to jump to. If not

yet done, the file will then be fully scanned. For large files,

delays may be noticeable.

3.2.3 The Data Menu

In the Data menu properties (read in and calculated) can be activated for displaying

and used for filtering and sorting.

Command Description

Properties Opens up a sub menu where all calculable properties and all

properties from the input file can be selected for display. The

visible properties and the properties from the input file will be

written out to the output file upon "File → Save".

This menu is detachable by clicking on the dotted line at the

top of the menu strip - the menu will then become a window

Page 13: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

7

Command Description

of its own, stay on the screen and allow enabling and

disabling of properties by a single mouse click.

Filter… Use this function to filter out structures from the list. When

this command is selected, a dialog box will open up:

In the uppermost part, the logic how to connect filter

conditions can be selected, if there are more than one

filter condition. Filters can be connected that all filter

conditions have to be matched (logical AND, which is the

default), or that matching any filter condition (logical OR)

will suffice.

In the next line, the property to be used for filtering can be

selected. All calculable properties and the properties read

in from the input file are available. Once a property has

been selected, suitable comparison operators become

available. In the third column, the value (or threshold) for

the comparison has to be entered.

Pressing the More button adds new filter condition lines

to the dialog. They can be removed again by pressing the

Less button, which will in turn remove the last filter

condition. All rows set to "none (please select property)"

are automatically ignored in the filtering process.

For the deactivation of filters, open this dialog again and

either remove all filters using the Less button or set each

property selector to "none (please select property)".

Please be aware that properties read in from SMILES and

SD files are always treated as character strings and, thus,

never allow the use of operators like < or >. To allow these

numerical comparisons on properties read in from file, the

CTX file format for input should be used which allows the

qualification of properties as integers or real numbers.

Sort… The "Sort" command opens a dialog to select which property

should be used for sorting and to determine whether the

records should be sorted in ascending or descending order.

The sorting functionality is also available (for visible

properties) by clicking on the column headers of the record

table. The first click will start the sorting in ascending order,

the second click reverses the sorting to descending order

and a third click will apply the order of the structures as given

in the input file.

Comparisons for sorting are done numerically for numeric

properties. As mentioned in the section about "Filtering",

Page 14: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

8

Command Description

properties read in from SMILES and SD files are treated as

alpha-numeric data. Thus, please be aware that rows are

always sorted in alphabetic order and not numerical order.

3.2.4 The Extra Menu

In the Data menu, tools for customizing and manipulating the starting material and

reaction center sub-structures (reaction fitness) database can be started.

Command Description

Open Database

Manager

Start the database manager to customize and manipulate the

starting material and reaction center sub-structures (reaction

fitness) database.

3.2.5 The Help Menu

The Help menu opens additional information of SYLVIA.

Command Description

Index… The "Index" command opens the online help.

License… This function will open an information window with details

about the license governing the use of the program. Most

notably, it will detail when the license expires.

About… This command provides a list of the underlying technology of

the program as well as the third-party components employed.

3.3 The Wizard

The Wizard is started if the program is called without any file arguments. Its main

purpose is to make the selection of input easy also for the casual user.

Page 15: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

9

3.3.1 The Welcome Page

The "Welcome" page allows to select whether a structure should be input by means of

a chemical structure editor, or, if chemical structures should be read in from a file in a

standard chemical file format (e.g., SD file [2], SMILES [3]). Either way, without a

decision here, it is not possible to continue (see Figure 3).

Figure 3 The Welcome page of the SYLVIA wizard.

3.3.2 The Editor Page

The "Editor" page allows the start of an external structure editor, usually the JME

Molecular Editor. After input, the chemical structure is shown for direct visual inspection

(see Figure 4).

Figure 4 The Editor page of the SYLVIA wizard.

Page 16: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

10

3.3.3 The File Selection Page

Standard chemical structure files, e.g., Molfile, SD files [2], Daylight SMILES [3] files

and other files can be selected here. When a valid structure file is selected or input in

the field "Input file name", the next page of the Wizard can be called (see Figure 5).

Figure 5 The File Selection page of the SYLVIA wizard.

3.3.4 The Property Selection Page

If the chemical structure file contains any properties, one of these properties can be

selected to serve as an identifier in the list of structures to be shown in the main

window of the program. Alternatively, all properties that are available in the input file

can be chosen to be displayed in the main window. In both cases, the selection of

visible properties can be modified later by means of the "Data" and "Properties" menu.

Figure 6 The Property Selection page of the SYLVIA wizard.

Page 17: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

11

3.3.5 The Good Bye Page

The "Good Bye" page wraps up the communication made with the application so far. It

also allows to de-activate the further use of the wizard by checking the shown box.

Even if de-activated, the Wizard can be activated again at any time by using the "Help"

menu of the Wizard.

Figure 7 The Good bye page of the SYLVIA wizard.

3.4 Synthetic Accessibility Settings

Once a structure or a structure file is loaded, on the right hand side of the SYLVIA

window, a detachable dialog is present that can be used to specify options for the

calculation and display of the synthetic accessibility values (see Figure 8).

Figure 8 The main window with loaded structures.

Page 18: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

12

3.4.1 Thresholds for Synthetic Accessibility

Threshold for Easy Compounds

Using this slider, the threshold for easy compounds can be set. Compounds

with a synthetic accessibility estimation below this threshold are considered to

be easy to synthesize and are displayed with a green background.

Threshold for Difficult Compounds

With this slider, the threshold for difficult compounds can be set, i.e., the

threshold above which compounds will be classified as difficult to synthesize.

These compounds are displayed with a red background in the synthetic

accessibility column.

Compounds where the synthetic accessibility estimation lies between these two

thresholds are generally shown with a yellow background color in the synthetic

accessibility value cell.

The dialog window also provides three buttons. The first button "Help" brings up this

help, the second "Reset" re-sets the parameter values to their initial (factory) values,

and the "Expert" button that enables additional options for experts.

3.4.2 Additional Options for Experts

A warning is given before the expert options are detailed here: the expert options can

easily lead to sub-optimal and sometimes misleading results, and should be used with

appropriate care. To stress this point, when the expert mode is entered, a dialog

window appears and asks for verification. This dialog can be suppressed for

subsequent program uses by choosing the middle button of the confirmation dialog.

Starting Material Database

Using this option, a customized starting material database can be loaded and

used for the calculation of the "starting material similarity score", e.g., generated

from compound collections or databases available in-house (see Figure 9). This

customized starting material database has to be generated from a structure file

available in a standard file format (SD, SMILES). The generation and import of

such a customized starting material database is described below in the section

"Generation of Customized Starting Material Databases".

The starting material database can be reset to the standard (factory) database

by pressing the "Reset" button or by selecting "Built-in Starting Materials" in

the drop-down menu (see Figure 9).

Page 19: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

13

Figure 9 Choosing a starting material database.

Reaction Center Substructure Database

Using this option, a customized reaction center substructure database can be

loaded used for the calculation of the "reaction center substructure similarity

score", e.g., generated from reaction collections or databases available in-

house (see Figure 10). This customized reaction center substructure database

has to be generated from a reaction file available in the RD file format. The

generation and import of such a customized reaction center substructure

database is described below in the section "Generation and Import of

Customized Reaction Center Substructure Databases".

The reaction center substructure database can be reset to the standard

(factory) database by pressing the "Reset" button or by selecting "Built-in

Reaction Center Substructures" in the drop-down menu (see Figure 10).

Page 20: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

14

Figure 10 Choosing a reaction center sub-structure database.

3.5 Database Management

In the following sections the management of the databases that are used by SYLVIA is

described. These databases can be managed using the Database Manager which is

opened by selecting "Open Database Manager" in the "Extra" menu (see Figure 11).

Figure 11 The Database Manager.

SYLVIA uses two different databases for the evaluation of the synthetic accessibility of

organic compounds.

Starting material database for calculating the starting material similarity score of

Page 21: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

15

a query compound.

Reaction center substructure database for calculating the reaction center (or

reaction fitness) score of a query compound.

The underlying scientific methods of the two database-dependent scores are described

in the section "Understanding SYLVIA – The Scientific Method" of this manual.

3.5.1 Generation and Import of Customized Starting Material Databases

SYLVIA can use any structure data to evaluate the starting material similarity of a

query compound. However, the data have to be converted from typical structure file

formats (SD, SMILES) into the native database format of SYLVIA (SMDB) and then

imported into SYLVIA to make it available in the "Expert" section of the UI (please see

above).

3.5.1.1 Generation of Customized Starting Material Databases

By selecting Open Database Manager from the Extra menu and switching to the tab

"Starting Material Databases" new starting material databases can be created and

managed. By default, the "Built-in Starting Materials" are selected (factory database

shipped with the SYLVIA product). By clicking the button "Create" the dialog to

generate customized starting material databases from any structure files (SD, SMILES)

is opened. The input structure file and the name of the generated (customized) starting

material database can be selected in the file dialog or directly entered into the editable

fields. A valid file name as input and a reasonable file name for a database file have to

be given in order to finish this dialog successfully (see Figure 12).

If an existing starting material database should be extended by new compounds, it can

be selected in the field "Extend Existing Starting Material Database". It is possible to

extend the standard (factory) starting material database (database file Built-in Starting

Materials). However, it is required to enter a new name in the field "New Database

Name" in order to keep a copy of the original standard (factory) starting material

database for back-up purposes (see Figure 12).

If the structure file that was used to create the database is loaded into SYLVIA for

synthetic accessibility assessment, scores can be higher than expected. This can have

mainly two reasons. First, starting material similarity is only one out of five components

of the synthetic accessibility score and even if this score is low, the other scores might

be not. Secondly, molecules that contain mostly "non-standard" organic chemistry

elements are excluded during database generation. However, these molecules are not

excluded for the ranking. Therefore, such structures are usually predicted as more

difficult to synthesize than others.

Page 22: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

16

Figure 12 Generation of a customized starting material databases.

Advanced Options

In the "Advanced" section, additional options for generating a new starting material

database are available (see Figure 12).

Keep Identical Compounds

With this option, compounds that occur multiple times in the original compound

database are counted as often as they appear. This gives a stronger weight to

those compounds. By default, this option is de-activated.

Store Transformed Structures in File

The results of the transformation steps carried out during the conversion step

can be saved to a separate structure file. A file name for this structure file can

be selected or directly entered into the field below this option. However, the file

of transformed structures can be very large with about 10 times more records

(molecules) than in the original structure file. The transformed structures can

provide insight into the conversion process and can answer questions on why a

particular compound receives a synthetic accessibility score higher or lower

than expected.

Read Only Parts of the Input File

The range of records (structures) in the input file which should be used to

generate the new starting material database can be selected and specified the

by record numbers.

Database Generation Statistics

After the generation of a new (or the extension of an existing) starting material

database, the statistics about the generation process are summarized in a separate

window. The statistics summary lists

How many structures have been read in

How many structures have been considered for the database generation

Page 23: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

17

How many structures and why they have not been considered (or ignored)

CPU time to build the database

The number of how many structures have been considered to build up a database is

also shown in the lower right part of the "Database Manager" dialog for the database

that is currently selected.

Command Line Tool

In the XT version of SYLVIA, customized starting material databases can also be

generated with the command line tool "extractsSM" which is described in the section

"extractSM" of this manual.

3.5.1.2 Import of Customized Starting Material Databases

Once a customized starting material database has been created, it has to be imported

into SYLVIA by pressing the button "Import". A dialog box appears where the starting

material database file can be selected or directly entered into the field "New Starting

Material Database File". In addition, a name for the database is suggested, but can be

changed in the field Name of the dialog. Pressing the button "Import" in this dialog

finishes the import confirmed by a message box (see Figure 13).

Figure 13 Import of a customized starting material databases.

3.5.2 Generation and Import of Customized Reaction Center Substructure Databases

SYLVIA can use reaction data to evaluate the reaction fitness of query compounds.

However, the data have to be converted from a reaction file in the RD format into the

native database format of SYLVIA (RCDB) and then imported into SYLVIA to make it

available in the "Expert" section of the UI (please see above).

Generation of Customized Reaction Center Substructure Databases

By selecting Open Database Manager from the Extra menu and switching to the tab

Reaction Center Substructure Databases new reaction center substructure databases

can be created and managed. By default, the "Built-in Reaction Center

Page 24: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

18

Substructures" are selected (factory database shipped with the SYLVIA product). By

clicking the button "Create" the dialog to generate customized reaction center

substructure databases from any reaction file (in RD format) is opened. The input

reaction file and the name of the generated (customized) reaction center substructure

database can be selected in the file dialog or directly entered into the editable fields. A

valid file name as input and a reasonable file name for a database file have to be given

in order to finish this dialog successfully (see Figure 14).

If an existing reaction center substructure database should be extended by new

reactions, it can be selected in the field "Extend Existing Reaction Center

Substructure Database". It is possible to extend the standard (factory) reaction center

substructure database (database file "Built-in Reaction Center Substructures").

However, it is required to enter a new name in the field "New Database Name" in order

to keep a copy of the original standard (factory) starting material database for back-up

purposes (see Figure 14).

Figure 14 Generation of a customized reaction center sub-structure databases.

Advanced Options

In the "Advanced" section, additional options for generating a new reaction center

substructure database are available (see Figure 14).

Keep Identical Reactions

With this option, reactions (or reaction centers) that occur multiple times in the

original reaction database are counted as often as they appear. This gives a

stronger weight to those reactions. By default, this option is de-activated.

Store Transformed Structures in File

The results of the reaction center substructure extraction carried out during the

conversion step can be saved to a separate structure file. A file name for this

structure file can be selected or directly entered into the field below this option.

The extracted reaction center substructure can provide insight into the

Page 25: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

19

conversion process and can answer questions on why a particular compound

receives a synthetic accessibility score higher or lower than expected.

Read Only Parts of the Input File

The range of records (reactions) in the input file which should be used to

generate the new reaction center substructure database can be selected and

specified the by record numbers.

Database Generation Statistics

After the generation of a new (or the extension of an existing) reaction center

substructure database, the statistics about the generation process are summarized in a

separate window. The statistics summary lists

How many reactions have been read in

How many reactions have been considered for the database generation

How many reactions and why they have not been considered (or ignored)

CPU time to build the database

The number of how many reaction center substructures have been considered to build

up a database is also shown in the lower right part of the Database Manager dialog for

the database that is currently selected.

Command Line Tool

In the XT version of SYLVIA, customized reaction center sub-structure databases can

also be generated with the command line tool "extractRCSS" which is described in the

section "extractRCSS".

3.5.2.1 Import of Customized Reaction Center Substructure Databases

Once a customized reaction center substructure database has been created, it has to

be imported into SYLVIA by pressing the button "Import". A dialog box appears where

the starting material database file can be selected or directly entered into the field "New

Reaction Center Substructure Database File". In addition, a name for the database

is suggested, but can be changed in the field "Name" of the dialog. Pressing the button

"Import" in this dialog finishes the import confirmed by a message box.

Figure 15 Import of a customized reaction center sub-structure databases.

Page 26: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA GUI Version

20

3.6 Total Synthetic Accessibility Score and Components Contributions

The scientific method behind SYLVIA and how the total synthetic accessibility of a

compound is estimated is described in section "Understanding SYLVIA – The Scientific

Method" of this manual.

The individual contributions to the total synthetic accessibility score can be displayed

(and written out to an SD file) via the menu item "Properties" of the "Data" menu.

The following table lists the synthetic accessibility properties that are calculated by

SYLVIA and their name in the Data menu of the GUI as well as the data field under

which they are stored in output SD files.

Name in GUI and in exported SD file Description

M_SYN_ACCESSIBILITY Synthetic accessibility: total synthetic

accessibility score of a molecule

M_GRAPH_SCORE Molecular graph complexity score:

contribution to total synthetic accessibility

score based on molecular graph complexity

M_RING_SCORE Ring complexity score: contribution to total

synthetic accessibility score based on ring

complexity

M_STEREO_SCORE Stereochemical complexity score:

contribution to total synthetic accessibility

score based on stereochemical complexity

M_STARTING_MATERIAL_SCORE Starting material similarity score:

contribution to total synthetic accessibility

score based on similarity to available

starting materials

M_REACTION_CENTER_SCORE Reaction center substructure score:

contribution to total synthetic accessibility

score based on similarity to known reaction

center substructures

Page 27: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA Batch Version and Associated Tools

21

4 SYLVIA Batch Version and Associated Tools

4.1 SYLVIA Batch Version

The command line tool "sylvia" calculates the synthetic accessibility estimations for

organic molecules in batch mode by reading in a structure file and writing out a

structure file with the stored scores.

4.1.1 Description

Calculates synthetic accessibility values for given chemical structures given in a file (or

SMILES on standard-in).

4.1.2 Synopsis

sylvia [option]... inputfile outputfile

4.1.3 Supported File Formats

File File format File extension

Input file SD file (structure data file),

SMILES

sdf, smi or smiles

Output file SD file (structure data file),

SMILES

sdf, smi or smiles

4.1.4 Command Line Options

Short

option

Long option Description

--help Print this help on the screen

--rcssdb file Use the reaction center substructure database

file (default: use standard/factory reaction

center substructure database)

--smdb file Use the starting material database file (default:

use standard/factory starting material database)

Page 28: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA Batch Version and Associated Tools

22

Short

option

Long option Description

--discard-on-failure Remove records from the output file for which

the calculation failed or which are not readable

from the output file (default: write out original

record and "NULL" for not calculable properties;

if record is not readable and input and output

format are not equal, an empty compound

record is written)

--discard-read-properties Remove properties that were read from the

input file (default: write out all properties (SD

data fields and values) that are present in the

input file)

--trace-area area Restrict the output of trace message to defined

trace areas (area: All (default), App)

--trace-to file Write trace output to file (file: cout, cerr, or given

file)

--version Show the program version

-a --all-terms Write out all individual terms contributing to the

total synthetic accessibility score

-c --config-file file Read the configuration file file

-d --daemon Enable the daemon mode. In this mode, only

SMILES strings are accepted and read from

standard input and accessibility estimations are

written to standard out. If END is given instead

of a valid SMILES, the program is terminated. If

the given SMILES string cannot be read the

string ERROR is written out.

-e --errfile file Write out structures which cannot be read into

separate file file. If this option is not given,

unreadable records are copied verbatim into the

output file.

-m --molecular-complexity Write out molecular graph complexity score.

-n --stereochemical-

complexity

Write out stereochemical complexity score.

-o --ring-complexity Write out ring complexity score.

-p --propname name Use property name name in output file instead

of M_SYN_ACCESSIBILITY.

-r --reaction-center-similarity Write out reaction center similarity score.

Page 29: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA Batch Version and Associated Tools

23

Short

option

Long option Description

-s --starting-material-

similarity

Write out starting material similarity score.

-t --trace-level level Set the minimum importance of logged

messages to level (level: crit, fail, warn, or

verbose)

4.1.5 Examples

sylvia -e errors.sdf input.sdf output.sdf

sylvia --smdb myStartingMaterials.smdb --daemon

4.2 extractSM Batch Tool

The command line tool "extractSM" generates synthetic accessibility starting material

databases by reading in a structure file and writing out the transformed structures in a

database file that can be read in by SYLVIA.

4.2.1 Description

Extracts starting material similarity hash codes from a structure file and creates a

database.

4.2.2 Synopsis

extractSM [option]... inputStructures.sdf smDB.smdb

4.2.3 Supported File Formats

File File format File extension

Input file SD file (structure data file),

SMILES

sdf, smi or smiles

Output file Starting material database smdb

Page 30: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA Batch Version and Associated Tools

24

File File format File extension

(SQLite-based)

4.2.4 Command Line Options

Short

option

Long option Description

--from no Start conversion from record no (inclusive)

--to no Stop conversion at record no (inclusive)

-i --include Include starting materials which are already

present in the database (multiplets)

-s --output-structure file Store the derived starting materials in the file file

(SD formatted, usually only used to better

understand database generation)

-t --trace-level level Set the importance of logged messages to level

(level: crit, fail, warn, verbose)

--trace-to file Write the trace output to file file (file: cout, cerr,

or filename)

--help Print help page to the screen

--version Print the program version to the screen

4.2.5 Examples

extractSM -s transformedStructs.sdf inhouseData.sdf inhouseData.smdb

extractSM --from 0 --to 100 myStructs.smi test.smdb

extractSM -i supplier1.sdf supplier-all.smdb

extractSM -i supplier2.sdf supplier-all.smdb

extractSM -i supplierN.sdf supplier-all.smdb

4.2.6 Remarks

If the output file already exist, entries are added to it. Structures already in the

database are usually ignored, but can be included by specifying the option "-i".

The command line tool "extractSM" is only available in the SYLVIA XT version,

Page 31: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA Batch Version and Associated Tools

25

however, the functionality is also available in the SYLVIA GUI program (Database

Manager).

4.3 extractRCSS Batch Tool

The command line tool "extractRCSS" generates a database of reaction center sub-

structures from an input reaction database that can be read in by SYLVIA.

4.3.1 Description

Calculates hash codes of reaction center sub-structures (rcss) from a reaction

database, records the occurrences of the reaction center sub-structures in the

reactions and creates a database.

4.3.2 Synopsis

extractRCSS [option]... inputReactions.rdf rcssDB.rcdb

4.3.3 Supported File Formats

File File format File extension

Input file RDfile (reaction data file) rdf

Output file Reaction center database

(SQLite-based)

rcdb

4.3.4 Command Line Options

Short

option

Long option Description

--from no Start conversion from record no (inclusive)

--to no Stop conversion at record no (inclusive)

-i --include Include reaction center substructures which are

already present in the database (multiplets)

Page 32: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

SYLVIA Batch Version and Associated Tools

26

Short

option

Long option Description

-s --output-structure file Store the derived reaction center substructures

in the file file (SD formatted, usually only used to

better understand database generation)

-t --trace-level level Set the importance of logged messages to level

(level: crit, fail, warn, verbose)

--trace-to file Write the trace output to file file (file: cout, cerr,

or filename)

--help Print help page to the screen

--version Print the program version to the screen

4.3.5 Examples

extractRCSS -s rxn_substructures.sdf inhouse_rxns.rdf inhouse_rcss.rcdb

extractRCSS --from 0 --to 100 --include inhouse_rxns.rdf inhouse_rcss.rcdb

4.3.6 Remarks

If the output file (reaction center substructure database, e.g., "rcssDB.rcdb") already

exists, the hash-codes of the new substructures are added to this database file.

Structures already in the database are usually ignored, but can be included by

specifying the option "-i" (or "--include").

The command line tool extractRCSS is only available and shipped as part of the

SYLVIA XT version, however, the functionality is also available in the SYLVIA GUI

program (Database Manager).

Page 33: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Understanding SYLVIA – The Scientific Method

27

5 Understanding SYLVIA – The Scientific Method

5.1 Overview

The synthetic accessibility estimation consists of five components. The first three

components are based on structural features of the target structure only [4].

Molecular Graph Complexity Score

This score is based on graph and information theories and takes account of the

size, symmetry, branching, rings, multiple bonds and heteroatoms of the target

molecule.

The corresponding data field in SD output files is named "M_GRAPH_SCORE".

Ring Complexity

This score penalizes bridged and fused ring systems which might be more

difficult to be synthesized and thus increase the synthetic accessibility score.

The corresponding data field in SD output files is named "M_RING_SCORE".

Stereochemical Complexity

This score is a simple counter of tetrahedral stereo centers in the target

structure which make the synthesis of the target more difficult.

The corresponding data field in SD output files is named

"M_STEREO_SCORE".

The latter two components are data based, take up a larger portion of the calculation

time, but also provide more meaningful results.

Starting Material Similarity Score

Structures with complex structural motifs can still easily be synthesized if the

complex parts are covered by available starting materials. Therefore,

compounds with a high starting material similarity are searched in a

preprocessed database. The more similar compounds are identified and the

higher the coverage of the target molecule, the easier it is to synthesize a given

target compound.

The corresponding data field in SD output files is named

"M_STARTING_MATERIAL_SCORE".

Reaction Center Substructure Score

Synthesis design programs perform comprehensive retrosynthetic analysis in

order to transform the synthetic target structure to a sequence of progressively

simpler structures along a retrosynthetic pathway, which ultimately leads to

simple or commercially available starting materials. Accordingly, synthetic

accessibility can be approximated by analyzing structural motifs where the

target molecule can be decomposed into smaller components.

The corresponding data field in SD output files is named

"M_REACTION_CENTER_SCORE".

Page 34: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Understanding SYLVIA – The Scientific Method

28

Calculation of the Total Score

The overall synthetic accessibility score of a target structure is calculated by summing

the five weighted individual components.

In order to determine the weights of each component, five medicinal chemists

have manually evaluated a dataset of 100 structures according to their synthetic

accessibility on a scale between 1 (easy to synthesize) and 10 (difficult to

synthesize). The structures of the dataset were taken from the Journal of

Medicinal Chemistry and vary in size and complexity. Based on the average

scores of the medicinal chemists, a linear regression analysis was used to

calculate the weights of each component. Due to the method, a fixed increment

of 0.68 is added to the sum of the weighted components (intercept of y-axis).

The corresponding data field in SD output files is named

"M_SYN_ACCESSIBILITY".

In the following, the components "Starting Material Similarity Score" and "Reaction

Center Substructure Score" are described in more detail. For a full overview about the

method, reading of [4] is recommended.

5.2 Starting Material Similarity Score

The synthetic accessibility of a target structure highly depends on the degree of

resemblance between the target structure and available starting materials. However,

the concept of molecular similarity is application dependent - therefore a variety of

similarity measures are known in chemoinformatics. For the purpose required here,

similarity scores useful for the design of syntheses have been developed [5].

5.2.1 Similarity Search

Assessing synthetic proximity between a target structure and a set of starting materials

demands a different approach to similarity than biological activity. SYLVIA applies

transformation-based similarity criteria which have specifically been developed for

synthesis design and reaction planning. By definition, two compounds are considered

similar by a similarity search criterion if their transformed structures are identical. This

is shown in Figure 1.

Page 35: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Understanding SYLVIA – The Scientific Method

29

Figure 1 Concept of transformation-based similarity search; (a) similarity

criterion based on generalized reaction; (b) similarity criterion based on

structural features.

23 of such similarity criteria are implemented in SYLVIA. Some similarity search

transformations are based on generalized reactions (such as oxidation, reduction),

others are based on topological characteristics of the structure. The latter ones often

simply yield substructures, for example, taking the largest ring system of a query

structure.

Furthermore, there are similarity definitions that combine reaction type and

substructure characteristics (such as ring system with substitution pattern).

Figure 2 shows examples for such similarity search transformations in order of their

specificity.

Figure 2 Specificity of various similarity search transformations.

Within the similarity search process the target structure is modified according to

transformation rules associated with a certain similarity criterion. The transformed

structure is then compared with each transformed compound from the catalogs of

starting materials which was derived in advance for each transformation rule. This

process is illustrated in Figure 3 where the aromatic ring system including alpha atoms

Page 36: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Understanding SYLVIA – The Scientific Method

30

criterion is applied. If the transformed target structure and the transformed catalog

compound are identical, then the unchanged compound from the catalog of chemicals

is a potential starting material for the target compound according to this criterion.

Figure 3 Similarity search process applying the "aromatic ring system

including alpha atoms" transformation. The target (query) and all catalog

compounds are transformed by the same criterion.

5.2.2 Generation of Starting Material Databases

The necessary transformations are performed in advance for the entire starting

materials catalog. The transformed structures are stored in a database in the form of

hash codes along with counters that store the frequency of occurrence of hash-codes

for a specific transformation. By this means, only the target structure has to be

subjected to various transformations at run time, achieving rapid identification of

possible precursors.

A starting material similarity database was constructed from the combined Fluka, Acros

and Maybridge catalogs. If a starting material occurred more than once in this united

data set, then only one instance was considered. The starting materials were then

subjected to all available similarity search transformations, followed by generating hash

codes and keeping account of the frequency of occurrences.

This database generation from any chemical structure file is directly available in the

SYLVIA program - see the section on "Generation of Customized Starting Material

Databases" in this manual.

Page 37: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Understanding SYLVIA – The Scientific Method

31

Figure 4 Examples of similarity search transformations. Transformation steps:

(1) intact starting material

(2) taking atoms that are part of any aromatic ring system

(3) taking carbon atom side chains of aromatic ring systems

(4a) considering alpha heteroatoms

(4b) considering substitution pattern (heteroatoms are substituted by chlorine

atoms)

(5) selecting the largest fragment

Two examples for similarity search transformation are illustrated in Figure 4. The

transformation of a starting material (1) by similarity search criterion A starts with

identifying the aromatic atoms (2) and then taking carbon atom side chains that are

attached to aromatic ring systems (3). This is followed by taking into consideration of

the alpha heteroatoms (4a). Because this process can break the starting material into

smaller unconnected fragments, the largest of these fragments is selected as a result

of the similarity search transformation (5) for which a hash code is generated. The

similarity transformation B is differs from A by converting the alpha heteroatoms into

chlorine atoms (4b) thereby marking the possible substitution sites. The distribution of

structures generated by the first transformation is shown in Figure 5.

Page 38: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Understanding SYLVIA – The Scientific Method

32

Figure 5 Distribution of frequency of occurrences of structures resulting from

similarity search transformations: "aromatic ring + carbon skeleton with alpha-

atoms" (shown in Figure 3). The eight most frequent structures occurring are

shown with their frequency.

5.2.3 Evaluating Similarity to Starting Materials

After processing all transformations for the starting material databases, a similarity

score is calculated by mapping transformed structures back onto the target structure.

Each atom of the target structure receives an atom score that is initially set to 1.0.

These atom scores are then reduced by degrees as more and more potential

precursors are identified for the target structure. In this way, the atom scores reflect the

possible coverage of starting materials on the target structure.

If a similarity search transformation identifies potential precursors for the target

structure, then a transformation score is calculated.

The transformation score ranges between 0.0 and 1.0, and is devised to penalize

structural motifs that are infrequent in the similarity database. The larger the set of

potential precursors retrieved by a similarity search criterion, the smaller is the

corresponding transformation score. However, as the similarity criteria differ strongly in

specificity, an additional weight representing this circumstance is stored for each

transformation in the database and applied in the scoring.

The molecular starting material similarity score is finally calculated by summing up the

individual atom scores and normalizing the value.

Page 39: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Understanding SYLVIA – The Scientific Method

33

5.3 Reaction Center Substructure Score

The estimation of retrosynthetic reaction fitness is based on the determination of how

prone each bond in a target compound is to be built. By definition, a bond - or rather, a

structural motif consisting of a specific bond at its center and a neighborhood of varying

detail - is prone to retro-synthesis if it can be found on the product side of reactions in a

database.

To put it into other words: if the same bond that is present in a target structure can also

be found as part of the product reaction center in a database, the bond provides a

potential cutting-point for retro-synthesis. In SYLVIA, no real retro-synthesis is

undertaken, instead the bond receives a score indicating that it is easier to synthesize.

Figure 6 Extracting product reaction center substructures from synthetic

reaction.

In reaction databases, transformation characteristics of reactions are automatically

identified by marking the bonds directly involved in a reaction either as "change bond

order" or "make/break" bond (see example in Figure 6). Such bonds are called reaction

centers. This reaction center information (RC) is utilized to define a reaction center

substructure (RCSS) that consists of the atoms belonging to product reaction centers

along with their direct neighbors, i.e., alpha atoms. By considering the alpha atoms, the

influence of the chemical environment of the reaction center can be taken into account.

A hash code is generated for each identified product RCSS, that is then inserted into a

database keeping account of frequency of occurrences of unique reaction center

substructures.

5.4 Generation of Product Reaction Center Substructure Database

For database generation one has to clean up the reaction data first. Reactions with

inconsistent reaction center information or multiple product reaction centers have to be

removed. The latter is required in order to avoid the retrieval of unconnected reaction

center substructures (see example in Figure 7). Furthermore, reactions are also

eliminated when having undesired atom types at the product reaction center.

Page 40: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Understanding SYLVIA – The Scientific Method

34

Figure 7 Example for multiple product reaction center sites.

For each RCSS a hash code is generated by considering the topology of the bonds at

the reaction center (along with atom type, atom connection and bond order). The

topology of a bond is either acyclic, aromatic or cyclic non-aromatic.

The distribution of the 14,112 unique RCSS retrieved from the Theilheimer reaction

database is shown in Figure 8. It reveals that a significant number of RCSS occur very

infrequently. Three quarters of the unique RCSS are present only once and only 2.9%

of the unique RCSS occur more than 10 times.

Figure 8 Distribution of the frequency of unique reaction center substructures

retrieved from the Theilheimer reaction database. The eight most frequent

reaction center substructures with their frequency of occurrence are displayed.

Figure 9 illustrates the distribution of the number of heavy atoms of the retrieved

substructures along with the maximum frequency of occurrences for the given size.

The majority of reaction center substructures fell into the range of four to twelve heavy

atoms. The line representing the maximum frequency of occurrence peaks at a size of

four heavy atoms, and reaction center substructures exceeding a size of seven heavy

atoms have a low frequency of occurrence.

Page 41: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Understanding SYLVIA – The Scientific Method

35

Figure 9 Distribution of the number of heavy atoms of unique reaction center

substructures retrieved from the Theilheimer database. The green line

represents the maximum frequency of occurrence for the given heavy atom

size.

5.4.1 Evaluating retrosynthetic reaction fitness

The process of calculating the retrosynthetic reaction fitness is analogous to the

calculation of the starting material similarity score.

Each atom of the target structure is associated with an atom score that is initialized to

1.0 and gradually reduced as substructures from the RCSS database are mapped onto

the target structure.

For this, substructures, which can coincide with real RCSS, are exhaustively

enumerated in the target structure. By definition, a RCSS consists of at least one

reacting bond with its alpha atoms. The size of the enumerated RCSS is restricted to 4

in order to avoid the identification of those RCSS which have negligible effect to the

overall retrosynthetic reaction fitness score due to their low frequency of occurrences.

After the enumeration process, for each enumerated substructure it has to be

confirmed whether it corresponds to a genuine product RCSS stored in the database.

Therefore, a hash code is generated for each potential RCSS. If the hash code is

present in the database, a score is calculated based on the retrieved frequency of

occurrences on the matched RCSS and the most frequent RCSS in the database.. By

considering the frequency of occurrence of the verified RCSS, privilege is given to

structural motifs that correspond to common retrosynthetic retrons.

At the end of the process, the overall retrosynthetic reaction fitness score is calculated

by totaling up the individual atom scores and normalizing the sum with the number of

heavy atoms. The overall retrosynthetic reaction fitness ranges between 0.0 and 1.0. A

smaller value indicates that the target structure can be more easily synthesized,

because of the identification of more fitting retrosynthetic reactions can be found for the

target structure with high frequency of occurrences.

Page 42: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Technical Requirements

36

6 Technical Requirements

6.1 System Requirements

SYLVIA and its associated command line tools support the following hardware

platforms and operating systems.

x86 platforms running Microsoft® Windows® XP/7 (win32, it is recommended to have installed the latest service pack)

x86 platforms (32bit, 64bit) running Linux®, Kernel 2.4/2.6

6.2 Program Scope and Known Limitations

SYLVIA has been designed to process a broad range of organic chemistry.

There are no limitations concerning the number of atoms or bonds of a molecule. Note.

Some structure file formats that are supported might have such limitations.

Metal atoms and, especially transition metal atoms, can be processed but might cause

problems in certain atom descriptor calculation routines due to the lack of

parameterization.

For multi-fragment records (e.g., salts) only the largest fragment is taken into account

and smaller fragments (e.g., counter ions) are discarded.

The conversion from SD into SMILES format (and vice versa) may result in unexpected

atom types, resonance structures and missing stereo information (if available in input

file), however, the output structures are formally correct.

In special cases, the perception of aromaticity of charged, aromatic compounds may

behave non-deterministic if a charged atomic center is conjugated to aromatic system.

In case of doubt, the aromaticity perception and realization by SYLVIA can be checked

interactively by browsing the structure depictions (diagrams) in the GUI version.

Page 43: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Program Installation

37

7 Program Installation

7.1 Download from the Web Server of Molecular Networks

SYLVIA is available for electronic download via the Internet on the web server of

Molecular Networks (Download Area). At

www.mn-am.com/php/profile.php

an account can be created that provides access to licensed software, evaluation

copies, program manuals, example files, and tutorials as well as to test copies of a

variety of chemoinformatics applications offered by Molecular Networks.

The software packages are submitted electronically to the user as compressed files in

order to increase the download speed. The downloaded files can be easily

uncompressed with standard software tools for file compressing and archiving, such as

WinZip, FileZip (www.filezip.com), or gzip (www.gzip.org).

7.2 New Installation

7.2.1 Installation on x86 Linux Platforms

Please note. Administrator rights are required to install SYLVIA.

1) Download the distribution file of SYLVIA

sylvia_<version>_<OS>.tar.gz

and copy it into a temporary sub directory (e.g., /home/myName/sylviaTemp).

Please do not forget to save a copy of the distribution file for backup purposes.

2) Uncompress and de-archive the distribution file by using the following commands:

gunzip sylvia_<version>_<OS>.tar.gz

tar xvf sylvia_Code_<version>_<OS>.tar

3) A sub directory sylvia-<version>-installer will be created automatically. Change to

this directory and run the installation shell script install.sh. Follow the instructions

printed to the screen and specify a proper installation directory (installDir) for

SYLVIA.

4) After the installation is finished, a desktop link to the GUI version of SYLVIA should

have been created. Open the link. A dialog box for the installation of the license key

file (licenses.xml) will be opened. Click on the button "Install license file…" and

open your license key file which will then be installed on your system.

5) If no desktop link was created, open a new shell and start the GUI version with the

command

installDir/bin/Sylvia

Page 44: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Program Installation

38

and follow the instructions as described in (4).

For XT version (GUI and batch version)

1) The batch mode version of SYLVIA can be started with the command

installDir/bin/sylvia –help

that prints the command line options on your screen (all command line options are

also described in the online help of the GUI version).

For the batch version, it is also recommended to copy the license key file

(licenses.xml) into the sub-directory "etc" of the installation directory of SYLVIA

(e.g.,"installDir/etc").

2) You may add the SYLVIA installation directory to the environment variable "PATH"

in the ".login" or ".cshrc" file (".profile" or ".bashrc") or create a symbolic link in

the standard installation directory for binary executable files (local or system wide,

e.g., "/usr/local/bin").

3) The command line tools to generate customized starting material databases

("extractSM") and reaction center substructure databases ("extractRCSS") are

also located in the directory "installDir/bin". Both tools can be started with the

commands

installDir/bin/extractSM --help

installDir/bin/extractRCSS –help

that print the command line options on your screen (all command line options are

also described in the online help of the GUI version).

7.2.2 Microsoft Windows Platforms (win32, 7/8/10)

Please note. Administrator rights are required to install SYLVIA.

1) Download the Microsoft setup program of SYLVIA

sylvia_<version>_Win32_Setup.exe

and copy it into a temporary directory. Please do not forget to save a copy of the

setup program for backup purposes.

2) Double-click the setup program to start the installation of the program and follow

the instructions on the screen.

3) After the installation is finished, a desktop icon with a link to the GUI version of

SYLVIA should have been created. Double-click the icon. A dialog box for the

installation of the license key file (licenses.xml) will be opened. Click on the button

"Install license file…" and open your license key file which will then be installed on

your system.

For XT version only (GUI and batch version)

1) Double-click the desktop link for the batch mode version of SYLVIA. A console with

Page 45: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Program Installation

39

a command line prompt will be opened. To access SYLVIA from any directory on

your computer, it is recommended to add the installation directory of SYLVIA

(absolute path) to the system variable Path of your Windows system.

2) The batch mode version of SYLVIA can be started in the console (Windows DOS

prompt or Powershell) with the command

sylvia.exe --help

that prints the command line options on your screen (all command line options are

also described in the online help of the GUI version).

3) The command line tools to generate customized starting material databases

(extractSM) and reaction center substructure databases (extractRCSS) are also

located in the directory installDir/bin. Both tools can be started with the commands

installDir/bin/extractSM --help

installDir/bin/extractRCSS --help

that print the command line options on your screen (all command line options are

also described in the online help of the GUI version).

4) In order to execute SYLVIA and the associated tools from any directory on a

Windows machine, add the sub-directory "bin" of the installation directory of

SYLVIA (e.g., "C:\Program Files (x86)\sylvia\bin") to the environment variable

"Path" of the system settings as following.

a) Open the "Start" menu of the Windows system, then select "Control Panel" →

"System and Security" → "System" and click on the link "Advanced System

Settings" in the upper left part of the control panel. The "Systems Properties"

dialog appears (see Figure 10).

b) Select the tab "Advanced" in the "Systems Properties" dialog and press the

button "Environment Variables…" (see Figure 10).

Figure 10 The "System Properties" dialog.

c) The "Environment Variables" dialog appears. Select "Path" in the list of

"System variables" and click on the button "Edit" (see Figure 11 left).

d) Add the full path of the sub-directory corina (e.g.," C:\Program Files

Page 46: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Program Installation

40

(x86)\sylvia\bin" at the end of the field "Variable value" (see Figure 11 right).

Note. The newly added path variable has to be separated by the character ";" (semi

colon) from the existing path variables.

Figure 11 Specifying the "Environment Variable" for SYLVIA.

e) Confirm all changes by clicking the button "Ok" and close the "Control Panel".

7.3 Program Updates

For program updates, it is recommended to completely uninstall the current installation

before installing the update version.

Page 47: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Problems and Help!

41

8 Problems and Help!

If you have any difficulties with the installation of SYLVIA or if you encounter any

problems when running SYLVIA, please send all your inquiries to the following

address:

Molecular Networks GmbH Computerchemie Neumeyerstr. 28 90411 Nuremberg Germany

or contact us by email [email protected],

or by Fax +49 911 597 424 09.

Please include the input file, the output file, and any error message and send them to

us by email. These files will help us to analyze your problem; if your system displays

any error messages, please add them to your report. Thank you!

You can also use the report form in section 11 on page 44 of this manual.

Page 48: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Acknowledgements

42

9 Acknowledgements

SYLVIA is developed at Molecular Networks GmbH, Erlangen, Germany.

Some of the methods incorporated have been developed at the research group of Prof.

Dr. Johann Gasteiger at the University of Erlangen-Nuremberg, Erlangen, Germany.

The authors would like to thank all people involved in this software project.

The graphical user interface of SYLVIA has been developed using QT Designer,

Version 3.3.3 (Copyright 2002-2003, Trolltech AS, Norway. All rights reserved.

http://www.trolltech.com/).

Furthermore, the following components and/or component libraries are used and

acknowledged:

Xerces-C library (http://xml.apache.org/xerces-c/), copyright 1999-2005, The

Apache Software Foundation, MD, USA (http://www.apache.org/).

MinGW collection (http://www.mingw.org), copyright 2004.

Page 49: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

References

43

10 References

[1] MOSES is a C++ software library for Chemoinformatics applications that is owned,

developed and maintained by Molecular Networks GmbH, Nuremberg, Germany.

[2] a) Dalby, A.; Nourse, J. G.; Hounshell, W. D.; Gushurst, A. K. I.; Grier, D. L.; Leland, B.

A.; Laufer, J. Description of Several Chemical Structure File Formats Used by

Computer Programs Developed at Molecular Design Limited. J. Chem. Inf. Comput.

Sci. 1992, 32, 244-255. b) A detailed description of the file formats Mol, SD, and RD is

available on the Internet for download as a PDF document at

http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php.

[3] a) Weininger, D. SMILES, a Chemical Language and Information System. 1.

Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28,

31-36. b) Daylight Software Manual. Daylight Chemical Information Systems: Santa Fe,

NM, USA, 1993, http://www.daylight.com.

[4] Boda, K.; Seidel, T.; Gasteiger, J. Structure and reaction based evaluation of synthetic

accessibility. J. Comput.-Aided Mol. Des. 2007, 21, 311-325 (DOI 10.1007/s10822-

006-9099-2).

[5] Gasteiger, J.; Ihlenfeldt, W.-D.; Fick, R.; Rose, J. R. Similarity Concepts for the

Planning of Organic Reactions and Syntheses. J. Chem. Inf. Comput. Sci. 1992, 32,

700-712 (DOI 10.1021/ci00010a018).

Page 50: SYLVIA V1.4, Program Manual - MN-AMSYLVIA GUI Version 4 3 SYLVIA GUI Version 3.1 The Main Window In this section, the main program window is described. Help on the individual configuration

Report Form

44

11 Report Form

In the case of problems occurring during installation or running SYLVIA, please

complete the following form and send it or fax it to

Molecular Networks GmbH Computerchemie Neumeyerstr. 28 90411 Nuremberg, Germany FAX: +49 911 597 424 09

____________________________________________________________________

User:

____________________________________________________________________

SYLVIA program and version number:

Command line to run SYLVIA:

Error and warning messages by SYLVIA:

____________________________________________________________________

System messages:

____________________________________________________________________

Short description:

Please include the input file, output file and any log files generated by SYLVIA and

forward it by email to [email protected]. These files will help us to analyze your

problems. All data will be treated confidentially.