RICT User Guide PredictandClassify

25
***DRAFT*** ***DRAFT*** River Invertebrates Classification Tool (RICT) User Documentation 1. Guide to Predict and Classify (Interactive Mode) 1. Introduction This document provides a guide to the interactive Predict and Classify process within RICT. It contains: a) a summary of the data that needs to be provided to RICT b) options for how this data can be provided c) an overview of how to carry out a P&C run d) references to examples of actual runs carried out for common scenarios, including screenshots Note that it is focused on how the tool operates rather than the science/rationale behind it. 2. Summary of Data Required by RICT for a Predict &Classify Run In order to carry out a Predict & Classify run, the following data needs to be provided/specified: - Environmental Variable (EV) data for each relevant site - Observed Index values for each relevant site/index - Settings for the Run - Bias data for each Index/Season - Limits for each Index to be classified a) Environmental Variable (EV) data for each relevant site EV data needs to be provided in order to feed into the Prediction process which calculates ‘expected’ index values. Currently, the following data is required for each site (although this may change in future if new sets of Predictive Environmental Variables (PEVs) are created): - Grid Reference (NGR Letters/Easting/Northing) - Altitude - Slope - Discharge Category

description

RICT user guide. Predictions and classification processing

Transcript of RICT User Guide PredictandClassify

Page 1: RICT User Guide PredictandClassify

***DRAFT*** ***DRAFT***

River Invertebrates Classification Tool (RICT)

User Documentation

1. Guide to Predict and Classify (Interactive Mode)

1. Introduction

This document provides a guide to the interactive Predict and Classify process within RICT.

It contains:

a) a summary of the data that needs to be provided to RICT b) options for how this data can be providedc) an overview of how to carry out a P&C rund) references to examples of actual runs carried out for common scenarios, including

screenshots

Note that it is focused on how the tool operates rather than the science/rationale behind it.

2. Summary of Data Required by RICT for a Predict &Classify Run

In order to carry out a Predict & Classify run, the following data needs to be provided/specified:

- Environmental Variable (EV) data for each relevant site- Observed Index values for each relevant site/index- Settings for the Run- Bias data for each Index/Season- Limits for each Index to be classified

a) Environmental Variable (EV) data for each relevant site

EV data needs to be provided in order to feed into the Prediction process which calculates ‘expected’ index values.

Currently, the following data is required for each site (although this may change in future if new sets of Predictive Environmental Variables (PEVs) are created):

- Grid Reference (NGR Letters/Easting/Northing)- Altitude- Slope- Discharge Category- Velocity Category (if no Discharge Category)- Distance from Source- Mean Width- Mean Depth- Alkalinity- Total Hardness (if no Alkalinity)- Calcium (if no Alkalinity or Hardness)- Conductivity (if no Alkalinity or Hardness or Calcium)- % cover of boulders & cobbles- % cover of pebbles & gravel- % cover of sand- % cover of silt & clay

Page 2: RICT User Guide PredictandClassify

Note that, since RICT caters for multi-year runs, a Year needs to be provided for each Site entry.

b) Observed Index values for each relevant site/index

Sites are classified for each relevant index by dividing Observed Value by Expected Value to obtain an Environmental Quality Index (EQI). This is then compared against limits to obtain a classification status (e.g. High).

Therefore, Observed Values need to be provided for each Site for each Index that is to be classified.

As for EVs, Year needs to be provided for each Site/Index

c) Settings for the Run

In order for the Prediction and Classification run to be carried out in the way the user wishes, a number of settings need to be provided. The key ones are:

- End Group Set Id (e.g. 3 = GB New Model – 43 End Group Set)- Season Id (e.g. 5 = Spring and Autumn)- Indices Set (e.g. 2 = Original BMWP + MINTA)- PEV Set (e.g. 1 = GB)- Multi Year Flag (Y/N)- Reference Adjustment Flag (Y/N)- Taxonomic Prediction Flag (Y/N)- Taxonomic Prediction Level (e.g. TL1 = BMWP Family Level)- Output File Prefix- Run Name

Note that Number of Iterations can also be provided as a setting, but if not provided, it will be set to the default value held within the administration section.

d) Bias data for each Index/Season

The classification process takes account of Bias when varying the Observed Values and so Bias Values need to be provided for each Index for the Season Code relevant to the Run.

If no Bias values are provided for an Index then zero used … what if no Bias file? E.g. defaults file does not exist?

e) Limits for each Index to be classified

The classification process compares EQIs against limits and so the limits to be used for the run need to be provided for each index.

Page 3: RICT User Guide PredictandClassify

3. Options for Providing the Data

The input/output formats for RICT are XML. However, there are a number of options for providing/specifying the required data.

3.1 Environmental Variable Data

a) Create an XML File(s) and Upload

The XML file(s) must conform to the specified XML schema and an example of a valid file for one site is provided in Appendix 1.

The user can either create a new XML file(s) or amend an existing file. Information about how this can be done is being provided separately but, for example, ‘Notepad++’ can easily be used to open, amend and save an existing XML file.

Once created the file(s) can be uploaded to RICT during the run – see Section 4.

b) Load in an EV file in Existing RIVPACS Format

An EV file in the existing RIVPACS format (with extension of .asc) can be uploaded to RICT during the run – see Section 4. RICT then automatically converts the RIVPACS format file to the required RICT XML format.

Note that, as this facility is specific to the existing RIVPACS EV format, it will not be usable for any new EVs that may be introduced in future.

c) Create XML file(s) from Excel

It is expected that many users will maintain their EV data in Excel, and so a special RICT Data Entry spreadsheet has been created that enables XML files to be generated from Excel that can be processed by RICT.

More details are being provided on this separately but, briefly, the EV data has to be entered/copied into the appropriate columns in the ‘Environmental Variables’ worksheet and then the ‘Start Here’ worksheet is used to generate the XML format file.

The generated file(s) can then be uploaded to RICT during the run – see Section 4.

Note that the RICT Data Entry Spreadsheet has been set up so that data from an existing RIVPACS EV file can be cut and pasted into the spreadsheet if required.

Note also that the RICT Data Entry Spreadsheet is only applicable for existing EVs. If new EVs are introduced in future, then a new Data Entry Spreadsheet will be required and a change made to RICT to recognise the new data.

d) Manually Enter Data

Data can be manually entered during the run – see Section 4.

Note that manually entered data is subsequently saved as an XML format file. Therefore, this functionality could be used to create an initial XML file which could then be amended as required for future runs.

Page 4: RICT User Guide PredictandClassify

3.2 Observed Index Values

a) Create an XML File(s) and Upload

As for 3.1 a). An example file is provided in Appendix 2.

c) Load in an Observed Index file in Existing RIVPACS Format

An Observed Index file in the existing RIVPACS format (with extension of .oe1) can be uploaded to RICT during the run – see Section 4.

Note that, as this facility is specific to the existing RIVPACS Observed Index format, it will not be usable for any new Indices that may be introduced in future.

c) Create XML file(s) from Excel

As for 3.1 c) except for entering/copying the data into the ‘Observed and Expected’ worksheet.

d) Manually Enter Data

As for 3.1 d)

3.3 Settings for the Run

a) Use the Settings in the Default Settings File

The system has a default settings file defined which contains the settings that will be used for a run if no other settings file is provided – see Appendix 3 for example.

Before scheduling the run it is then possible to amend individual settings as required. Therefore, if there are only a couple of settings that are different from the defaults then it is easiest to use the default settings and then amend them prior to scheduling a run. Note that any changes made are applicable for that run only and do not change the underlying default settings file.

b) Change the Default Settings File

It is possible to change the file that is defined as the Default Settings file via the Administration function (see separate guide). This might be useful if a number of future runs are to have the same settings.

Note that it will still be possible to amend individual settings prior to scheduling a run.

c) Upload a Settings File

Rather than change the Default Settings file, it is possible to upload a Settings File for use during the particular run. This would be useful if multiple settings for the run are different from the defaults.

The user can either create a new Settings file or amend an existing file. Information about how this can be done is being provided separately but, for example, ‘Notepad++’ can easily be used to open, amend and save an existing XML file.

Page 5: RICT User Guide PredictandClassify

Note that it will still be possible to amend individual settings prior to scheduling a run.

Note also that the Settings for a particular run are subsequently saved as an XML format file. Therefore, a new Settings file could be created by using the Default Settings file and then amending the required settings prior to scheduling the run.

3.4 Bias data for each Index/Season

a) Use the Default Bias File

The system has a default bias file defined which contains the bias values that will be used for a run if no other bias file is provided – see Appendix 4 for example.

Before scheduling the run it is then possible to amend bias values as required. Note that any changes made are applicable for that run only and do not change the underlying default bias file.

b) Change the Default Bias File

It is possible to change the file that is defined as the Default Bias file via the Administration function (see separate guide). This might be useful if a number of future runs are to have the same bias values.

Note that it will still be possible to amend bias values prior to scheduling a run.

c) Upload a Bias File

Rather than change the Default Bias file, it is possible to upload a Bias File for use during the particular run. This would be useful if the required bias values are significantly different from the defaults.

The user can either create a new Bias file or amend an existing file. Information about how this can be done is being provided separately but, for example, ‘Notepad++’ can easily be used to open, amend and save an existing XML file.

Note that it will still be possible to amend bias values prior to scheduling a run.

Note also that the bias values for a particular run are subsequently saved as an XML format file. Therefore, a new Bias file could be created by using the Default Bias file and then amending the required values prior to scheduling the run.

3.5 Limits for each Index to be classified

a) Use the Default Limits File

The system has a default limits file defined which contains the limits that will be used for a run if no other limits file is provided – see Appendix 5 for example.

Before scheduling the run it is then possible to amend limits as required. Note that any changes made are applicable for that run only and do not change the underlying default limits file.

Page 6: RICT User Guide PredictandClassify

b) Change the Default Limits File

It is possible to change the file that is defined as the Default Limits file via the Administration function (see separate guide). This might be useful if a number of future runs are to have the same limits.

Note that it will still be possible to amend limits prior to scheduling a run.

c) Upload a Limits File

Rather than change the Default Limits file, it is possible to upload a Limits File for use during the particular run. This would be useful if the required limits are significantly different from the defaults.

The user can either create a new Limits file or amend an existing file. Information about how this can be done is being provided separately but, for example, ‘Notepad++’ can easily be used to open, amend and save an existing XML file.

Note that it will still be possible to amend limits prior to scheduling a run.

Note also that the limits for a particular run are subsequently saved as an XML format file. Therefore, a new Limits file could be created by using the Default Limits file and then amending the required values prior to scheduling the run.

Page 7: RICT User Guide PredictandClassify

4. Overview of Carrying out a Predict and Classify Run

The stages for carrying out a Predict and Classify Run are as follows:

a) Access and Log In to RICT (if not already done)

Full details of how to access and log in to RICT are provided in a separate document. However, it basically involves typing in the relevant URL to your browser and then entering your Username and Password.

This will display the Home screen:

Page 8: RICT User Guide PredictandClassify

b) Navigate to the Run Menu

Click on Run Menu which will result in a page similar to the following being displayed:

c) Create a New Run

Click on ‘Create a New Run’ which will result in the following page being displayed:

Page 9: RICT User Guide PredictandClassify

d) Select Run Type

Click on ‘Predict and Classify’ which will result in the following page being displayed:

e) Upload Files (if required)

If any files are to be uploaded then click on Browse, which will result in a page similar to the following being displayed:

Page 10: RICT User Guide PredictandClassify

Then navigate to the required file using normal Windows functionality and either double-click on the filename or select the filename and click on Open.

The file will then be uploaded to the RICT input area and a page similar to the following will be displayed. As part of the upload RICT will check to see if the format is recognised and, if so, the type of file will be displayed.

The above process should be repeated for all files that are to be uploaded.

Once all files have been uploaded then click on Continue. This will result in the files being processed and validated. A page similar to the following is then displayed:

Page 11: RICT User Guide PredictandClassify

f) Amend any Data

At this point there are options to:

- Amend the Data provided to the run - Amend the Settings applicable to the run- Amend the Limits applicable to the run- Amend the Bias data applicable to the run

i) Amend the Data provided to the run

This option will normally be used to manually enter data but can also be used to amend any data that has been loaded or add more data files to the run.

ii) Amend the Settings applicable to the run

The settings for the run will either have been taken from an uploaded settings file or the default settings file if no file has been uploaded.

This option can be used to amend the settings if required. Note that these will only be applicable for the current run. Also note that the settings used will be saved in a settings file that can then be used for future runs if required.

iii) Amend the Limits applicable to the run

As for ii) above

iv) Amend the Bias data applicable to the run

As for ii) above

g) Schedule the Run

Once any required data has been amended, then click on Schedule Run.

This will result in the run being scheduled and the Run Menu being displayed with the new run at the top – see below.

Note that there is an option to delay the run for a specified period if necessary (e.g. if it is a large run that is best scheduled outwith normal hours)

The run will initially be displayed with an ‘in progress’ icon and the page will refresh automatically until the run is complete, when a ‘complete’ icon will be displayed.

Page 12: RICT User Guide PredictandClassify

Run In progress:

Run complete:

Page 13: RICT User Guide PredictandClassify

h) View/Extract Results

Once the job is complete then the results can be viewed/extracted as follows:

- Reports

Access to the reports is via the shortcuts menu:

Detail to be added…

- Visualise

If there is an internet connection, then the results can be viewed (using a google maps interface) via the shortcuts menu:

Page 14: RICT User Guide PredictandClassify

This will result in a page similar to the following being displayed:

Note that a more detailed guide of the Run Menu is provided in a separate document.

5. Examples of Predict & Classify Runs

A number of Predict & Classify examples are being prepared:

Example 1 - Upload XML EVs and OE files (use defaults for rest)

Example 2 - Manual Input of EVs and OEs (use defaults for rest)

Page 15: RICT User Guide PredictandClassify

Appendix 1 – Sample XML Environmental Variable File

<?xml version="1.0" encoding="UTF-8" standalone="no"?><Datasets NS1:noNamespaceSchemaLocation="ev.xsd" xmlns:NS1="http://www.w3.org/2001/XMLSchema-instance">

<Creator>WQMASTER</Creator><Creation_Date>2008-02-14</Creation_Date><Name>Single Site</Name><Dataset ID="9875" Year="2007">

<Name>9875</Name><EV ID="NGR_LETTERS">

<Description>NGR_LETTERS</Description><Value>NT</Value>

</EV><EV ID="NGR_EAST">

<Description>NGR_EAST</Description><Value>08192</Value>

</EV><EV ID="NGR_NORTH">

<Description>NGR_NORTH</Description><Value>36934</Value>

</EV><EV ID="ALTITUDE">

<Description>ALTITUDE</Description><Value>190</Value>

</EV><EV ID="SLOPE">

<Description>SLOPE</Description><Value>1.1</Value>

</EV><EV ID="DISCHARGE">

<Description>DISCHARGE</Description><Value>3</Value>

</EV><EV ID="DIST_FROM_SOURCE">

<Description>DIST_FROM_SOURCE</Description><Value>11.4</Value>

</EV><EV ID="MEAN_WIDTH">

<Description>MEAN_WIDTH</Description><Value>2.625</Value>

</EV><EV ID="MEAN_DEPTH">

<Description>MEAN_DEPTH</Description><Value>40</Value>

</EV><EV ID="ALKALINITY">

<Description>ALKALINITY</Description><Value>80.9581</Value>

</EV><EV ID="BOULDER_COBBLES">

<Description>BOULDER_COBBLES</Description><Value>12.6667</Value>

</EV><EV ID="PEBBLES_GRAVEL">

<Description>PEBBLES_GRAVEL</Description><Value>46.6667</Value>

</EV><EV ID="SAND">

<Description>SAND</Description><Value>30.8333</Value>

</EV><EV ID="SILT_CLAY">

<Description>SILT_CLAY</Description><Value>9.8333</Value>

</EV></Dataset>

</Datasets>

Page 16: RICT User Guide PredictandClassify

Appendix 2 – Sample XML Observed/Expected Index File

<Datasets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="oei.xsd"> <Creator>WQMASTER</Creator> <Creation_Date>2008-03-06</Creation_Date> <Name>SEPA_3</Name> <Dataset Site_ID="9875" Year="2007"> <Index ID="ASPT" Name="ASPT"> <Observed_Value>5.655172413793103448</Observed_Value> </Index> <Index ID="BMWP" Name="BMWP"> <Observed_Value>164</Observed_Value> </Index> <Index ID="NTAXA" Name="NTAXA"> <Observed_Value>29</Observed_Value> </Index> </Dataset> <Dataset Site_ID="10480" Year="2007"> <Index ID="ASPT" Name="ASPT"> <Observed_Value>4.25</Observed_Value> </Index> <Index ID="BMWP" Name="BMWP"> <Observed_Value>51</Observed_Value> </Index> <Index ID="NTAXA" Name="NTAXA"> <Observed_Value>12</Observed_Value> </Index> </Dataset> <Dataset Site_ID="11030" Year="2007"> <Index ID="ASPT" Name="ASPT"> <Observed_Value>6.739130434782608696</Observed_Value> </Index> <Index ID="BMWP" Name="BMWP"> <Observed_Value>155</Observed_Value> </Index> <Index ID="NTAXA" Name="NTAXA"> <Observed_Value>23</Observed_Value> </Index> </Dataset></Datasets>

Page 17: RICT User Guide PredictandClassify

Appendix 3 – Sample Settings File

<Datasets xmlns:xdb="http://xmlns.oracle.com/xdb" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="settings.xsd"> <Dataset ID="0" Name="Rict"> <Setting Name="End_Group_Set"> <Value>3</Value> </Setting> <Setting Name="Season"> <Value>5</Value> </Setting> <Setting Name="Indices_Set"> <Value>2</Value> </Setting> <Setting Name="PEV_Set"> <Value>1</Value> </Setting> <Setting Name="Output_File_Prefix"> <Value>(Date)_rict_(Run_ID)_</Value> </Setting> <Setting Name="Run_Name"> <Value>Sepa_(Run_ID)</Value> </Setting> <Setting Name="Multi-Year"> <Value>N</Value> </Setting> <Setting Name="Ref Adjust"> <Value>Y</Value> </Setting> <Setting Name="Predict_Taxa"> <Value>N</Value> </Setting> <Setting Name="Predict_Taxonomic_Level"> <Value>TL1</Value> </Setting> <Setting Name="Simulation Iterations"> <Value>500</Value> </Setting> </Dataset></Datasets>

Page 18: RICT User Guide PredictandClassify

Appendix 4 – Sample Bias File

<?xml version="1.0" encoding="WINDOWS-1252" standalone='no'?><Datasets NS0:noNamespaceSchemaLocation="bias.xsd" xmlns:NS0="http://www.w3.org/2001/XMLSchema-instance"> <Dataset Index_Name="NTAXA" Season_ID="1"> <Value>1.62</Value> </Dataset> <Dataset Index_Name="NTAXA" Season_ID="2"> <Value>1.62</Value> </Dataset> <Dataset Index_Name="NTAXA" Season_ID="3"> <Value>1.62</Value> </Dataset> <Dataset Index_Name="NTAXA" Season_ID="4"> <Value>1.6524</Value> </Dataset> <Dataset Index_Name="NTAXA" Season_ID="5"> <Value>1.6524</Value> </Dataset> <Dataset Index_Name="NTAXA" Season_ID="6"> <Value>1.6524</Value> </Dataset> <Dataset Index_Name="NTAXA" Season_ID="7"> <Value>1.7982</Value> </Dataset> <Dataset Index_Name="ASPT" Season_ID="1"> <Value>0</Value> </Dataset> <Dataset Index_Name="ASPT" Season_ID="2"> <Value>0</Value> </Dataset> <Dataset Index_Name="ASPT" Season_ID="3"> <Value>0</Value> </Dataset> <Dataset Index_Name="ASPT" Season_ID="4"> <Value>0</Value> </Dataset> <Dataset Index_Name="ASPT" Season_ID="5"> <Value>0</Value> </Dataset> <Dataset Index_Name="ASPT" Season_ID="6"> <Value>0</Value> </Dataset> <Dataset Index_Name="ASPT" Season_ID="7"> <Value>0</Value> </Dataset> <Dataset Index_Name="BMWP" Season_ID="1"> <Value>0</Value> </Dataset> <Dataset Index_Name="BMWP" Season_ID="2"> <Value>0</Value> </Dataset> <Dataset Index_Name="BMWP" Season_ID="3"> <Value>0</Value> </Dataset> <Dataset Index_Name="BMWP" Season_ID="4"> <Value>0</Value> </Dataset> <Dataset Index_Name="BMWP" Season_ID="5"> <Value>0</Value> </Dataset> <Dataset Index_Name="BMWP" Season_ID="6"> <Value>0</Value> </Dataset> <Dataset Index_Name="BMWP" Season_ID="7"> <Value>0</Value> </Dataset></Datasets>

Page 19: RICT User Guide PredictandClassify

Appendix 5 – Sample Limits File

<Datasets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="limits.xsd"> <Dataset Type="Default" ID="Default" Description="Default Limit Set"> <Index NAME="NTAXA"> <Bucket Classification="H" ID="H" RANK="1"> <Lower_Bound Operator="gte">.8879</Lower_Bound> <Upper_Bound Operator="gte">10</Upper_Bound> </Bucket> <Bucket Classification="G" ID="G" RANK="2"> <Upper_Bound Operator="lt">.8879</Upper_Bound> <Lower_Bound Operator="gte">.7417</Lower_Bound> </Bucket> <Bucket Classification="M" ID="M" RANK="3"> <Upper_Bound Operator="lt">.7417</Upper_Bound> <Lower_Bound Operator="gte">.5954</Lower_Bound> </Bucket> <Bucket Classification="P" ID="P" RANK="4"> <Upper_Bound Operator="lt">.5954</Upper_Bound> <Lower_Bound Operator="gte">.491</Lower_Bound> </Bucket> <Bucket Classification="B" ID="B" RANK="5"> <Upper_Bound Operator="lt">.491</Upper_Bound> <Lower_Bound Operator="gte">0</Lower_Bound> </Bucket> </Index> <Index NAME="ASPT"> <Bucket Classification="H" ID="H" RANK="1"> <Lower_Bound Operator="gte">1.0059</Lower_Bound> <Upper_Bound Operator="gte">5</Upper_Bound> </Bucket> <Bucket Classification="G" ID="G" RANK="2"> <Upper_Bound Operator="lt">1.0059</Upper_Bound> <Lower_Bound Operator="gte">.8918</Lower_Bound> </Bucket> <Bucket Classification="M" ID="M" RANK="3"> <Upper_Bound Operator="lt">.8918</Upper_Bound> <Lower_Bound Operator="gte">.7778</Lower_Bound> </Bucket> <Bucket Classification="P" ID="P" RANK="4"> <Upper_Bound Operator="lt">.7778</Upper_Bound> <Lower_Bound Operator="gte">.6533</Lower_Bound> </Bucket> <Bucket Classification="B" ID="B" RANK="5"> <Upper_Bound Operator="lt">.6533</Upper_Bound> <Lower_Bound Operator="gte">0</Lower_Bound> </Bucket> </Index> </Dataset></Datasets>