Data management for the Murray-Darling Basin Sustainable ... · It is an output of the...

40
M. Hartcher and D. Lemon November 2008 Data Management for the Murray-Darling Basin Sustainable Yields Project A report to the Australian Government from the CSIRO Murray-Darling Basin Sustainable Yields Project

Transcript of Data management for the Murray-Darling Basin Sustainable ... · It is an output of the...

M. Hartcher and D. Lemon

November 2008

Data Management for the Murray-DarlingBasin Sustainable Yields Project A report to the Australian Government from the CSIRO Murray-Darling Basin Sustainable Yields Project

Murray-Darling Basin Sustainable Yields Project Acknowledgments

Prepared by CSIRO with contributions from: Sinclair Knight Merz, Resource; Environmental Management Pty Ltd; Department of Water

Land and Biodiversity Conservation (South Australia); Department of Sustainability and Environment (Victoria); Department of Water

and Energy (New South Wales); Department of Natural Resources and Water (Queensland); Murray-Darling Basin Commission; Bureau

of Rural Sciences; Geoscience Australia; Salient Solutions Australia Pty Ltd; eWater Cooperative Research Centre; University of

Melbourne; and several individual sub-contractors.

Murray-Darling Basin Sustainable Yields Project Disclaimers

Derived from or contains data and/or software provided by the Organisations. The Organisations give no warranty in relation to the data

and/or software they provided (including accuracy, reliability, completeness, currency or suitability) and accept no liability (including

without limitation, liability in negligence) for any loss, damage or costs (including consequential damage) relating to any use or reliance

on that data or software including any material derived from that data and software. Data must not be used for direct marketing or be

used in breach of the privacy laws. Organisations include: Department of Water, Land and Biodiversity Conservation (South Australia),

Department of Sustainability and Environment (Victoria), Department of Water and Energy (New South Wales), Department of Natural

Resources and Water (Queensland) and the Murray-Darling Basin Commission.

CSIRO advises that the information contained in this publication comprises general statements based on scientific research. The reader

is advised and needs to be aware that such information may be incomplete or unable to be used in any specific situation. No reliance or

actions must therefore be made on that information without seeking prior expert professional, scientific and technical advice. To the

extent permitted by law, CSIRO (including its employees and consultants) excludes all liability to any person for any consequences,

including but not limited to all losses, damages, costs, expenses and any other compensation, arising directly or indirectly from using

this publication (in part or in whole) and any information or material contained in it. Data is assumed to be correct as received from the

Organisations.

Report Acknowledgements

The Data Management component of the MDBSY project could not have been completed without the resourcefulness and commitment

of the Data Management Team and Project Data Coordinators. Key contributions came from; Jenet Austin, Pheobe Carmody, Phil

Davies, Trevor Dowling, Alex Dyce, Peter Dyce, Peter Fitch, Douglas Kerruish, Tegan Liston, Steve Marvanek, Arthur Read, Garry

Swan, Brendan Speet and Jamie Vleeshouwer.

This report was ably reviewed by Susan Cuddy and Yun Chen.

Citation

Hartcher M and Lemon D (2008) Data Management for the Murray-Darling Basin Sustainable Yields Project. A report to the Australian

Government from the CSIRO Murray-Darling Basin Sustainable Yields Project, CSIRO, Australia. 34pp.

Publication Details

Published by CSIRO ©2008 all rights reserved. This work is copyright. Apart from any use as permitted under the Copyright Act 1968,

no part may be reproduced by any process without prior written permission from CSIRO.

ISSN 1895-095X

Preface

This is a report to the Australian Government from CSIRO. It is an output of the Murray-Darling Basin Sustainable Yields

Project which assessed current and potential future water availability in 18 regions across the Murray-Darling Basin

(MDB) considering climate change and other risks to water resources. The project was commissioned following the

Murray-Darling Basin Water Summit convened by the then Prime Minister of Australia in November 2006 to report

progressively during the latter half of 2007. The reports for each of the 18 regions and for the entire MDB are supported

by a series of technical reports detailing the modelling and assessment methods used in the project. This report is one of

the supporting technical reports of the project. Project reports can be accessed at http://www.csiro.au/mdbsy.

Project findings are expected to inform the establishment of a new sustainable diversion limit for surface and

groundwater in the MDB – one of the responsibilities of a new Murray-Darling Basin Authority in formulating a new

Murray-Darling Basin Plan, as required under the Commonwealth Water Act 2007. These reforms are a component of

the Australian Government’s new national water plan ‘Water for our Future’. Amongst other objectives, the national water

plan seeks to (i) address over-allocation in the MDB, helping to put it back on a sustainable track, significantly improving

the health of rivers and wetlands of the MDB and bringing substantial benefits to irrigators and the community; and (ii)

facilitate the modernisation of Australian irrigation, helping to put it on a more sustainable footing against the background

of declining water resources.

Summary

The management of the storage of models, inputs and outputs, maps and plots, and reports required a substantial and

sustained investment in infrastructure and process. This report describes the computing equipment utilised for the project,

and the processes put in place to manage the acquisition, storage, maintenance and audit of the reporting materials.

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project

Table of contents

1 Introduction............................................................................................................................... 1

2 WRON infrastructure ................................................................................................................. 2

3 Data management .................................................................................................................. 4 3.1 Data permissions..........................................................................................................................................................4 3.2 Project data archive and workspaces ...........................................................................................................................4 3.3 Data management within project team..........................................................................................................................6

3.3.1 Catchment yield modelling team.....................................................................................................................6 3.3.2 Groundwater modelling team..........................................................................................................................7 3.3.3 River modelling team......................................................................................................................................9 3.3.4 Water accounting and environmental assessment team...............................................................................10 3.3.5 Reporting team.............................................................................................................................................10

3.4 Data licensing.............................................................................................................................................................12 3.5 Data standards...........................................................................................................................................................12

3.5.1 Data formats ................................................................................................................................................13 3.5.2 Naming conventions.....................................................................................................................................13 3.5.3 Coordinate systems......................................................................................................................................13

4 Data management tools ....................................................................................................... 14 4.1 SharePoint web sites..................................................................................................................................................14

4.1.1 MDB Partner portal.......................................................................................................................................14 4.1.2 Reporting Products portal.............................................................................................................................15 4.1.3 Review Panel portals....................................................................................................................................16

4.2 External data transfer via FTP ....................................................................................................................................18 4.3 Metadata and the metadata entry tool ........................................................................................................................18 4.4 Reporting database ....................................................................................................................................................22 4.5 Document management .............................................................................................................................................23

5 Data auditing .......................................................................................................................... 25 5.1 Audit trail ....................................................................................................................................................................25 5.2 Auditing methods........................................................................................................................................................25

Figures

Figure 2-1. The high-level architecture of the WRON Computing Facility in 2007.............................................................................3 Figure 3-1. Project archive directory structure..................................................................................................................................5 Figure 3-2. Catchment Yield modelling directory for the Murrumbidgee reporting region..................................................................7 Figure 3-3. Groundwater modelling directory structure for Rainfall Reduction Factor modelling .......................................................8 Figure 3-4. Groundwater modelling directory structure for the Namoi reporting region .....................................................................8 Figure 3-5. River Modelling directory structure for the Warrego reporting region..............................................................................9 Figure 3-6. Water accounting and environmental assessment directory structure ..........................................................................10 Figure 3-7. Reporting directory structure stored in project archive..................................................................................................11 Figure 3-8. Reporting directory structure within project teams........................................................................................................12 Figure 4-1. MDBSY Partner Portal SharePoint web page ..............................................................................................................15 Figure 4-2. Reporting portal home page.........................................................................................................................................16 Figure 4-3. Additional SharePoint portals used in the MDBSY project............................................................................................17 Figure 4-4. Technical Reference Panel portal home page..............................................................................................................17 Figure 4-5. SmartFTP client software interface ..............................................................................................................................18 Figure 4-6. Login page of the metadata entry tool..........................................................................................................................19 Figure 4-7. Home page of the metadata catalogue ........................................................................................................................20 Figure 4-8. Example of a metadata page in the metadata entry tool...............................................................................................21 Figure 4-9. Example of the editing interface of the metadata entry tool ..........................................................................................21 Figure 4-10. Lineage data input screen in the metadata entry tool .................................................................................................22 Figure 4-11. User interface of TRIM...............................................................................................................................................24

Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

1 Introduction

This report – one in a series of scientific reports from the CSIRO Murray-Darling Basin Sustainable Yields Project

(MDBSY) – describes the data management technical environment and formal protocols developed for handling

documents, modelling software, and data in support of the project. The development of and adherence to a data

management policy, as well as the provision of appropriate data management tools such as a metadata catalogue, were

integral to ensuring the project and team members were able to achieve the required outcomes and meet higher than

normal levels of scrutiny within the tight timeframes of the project.

The MDBSY project was a huge and diverse undertaking requiring input from a large number of people from within

CSIRO and various external sub-contractors. The amount and diversity of data, models and reports used or produced

within the project, along with the tight timeframes for delivery, required a professional approach to data management.

To this end, data management was undertaken as a separate component of the project with close linkages to other

teams. The key goals were to ensure that data used or generated within the project was

• accessible to those who needed it

• safe from being lost or corrupted

• managed according to requirements of data suppliers

• secure from those who did not need it

• had demonstrable integrity (i.e. it could be demonstrated how any individual dataset was produced and where it

came from).

By separating data management as a distinct team focus, it was possible to establish common protocols across the

various project teams, as well as establish a common data repository and a robust set of procedures for managing the

data store.

The MDBSY project generated a large volume of datasets and documents which had to be managed such that they were

secure, accessible, and described with appropriate metadata. The protocols that were established for all project teams

for data storage, access, security, backup, and archiving, ensured that the integrity of both datasets and documents was

and remains demonstrable. Protocols for exchanging data with external agencies and sub-contractors, sharing project

documents with project team members and disseminating information, and issues of confidentiality and restricted access

to some documents and data were also developed and administered.

A critical responsibility for the Data Management team was to ensure a complete audit trail existed for all modelling

results, all original and interim datasets, all software versions, and all reports. These were archived in the project data

repository, with metadata statements completed and stored within a relational database.

The Data Management team took responsibility for

• provision of secure centralised computing facilities (including data storage and processing)

• provision of project collaboration tools (including the Project SharePoint site, Project Data Catalogue, and data

exchange facilities)

• development of a project reporting database

• ensuring all data collected for the project was appropriately described and licensed

• ensuring a full audit trail of all steps of the analysis process was captured

• ensuring commitments made to third parties with respect to data and models were fulfilled.

This report outlines the protocols, the various locations used to store and exchange documents and data, the tools used

for managing data, and the issue of having a robust audit trail.

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 1

2 WRON infrastructure

Data storage and processing for the MDBSY project was, where appropriate and possible, performed using the CSIRO

WRON1 Computing facility (Figure 2-1). This facility, located in the Christian Lab, at CSIRO Black Mountain Laboratories,

has been designed specifically to support both the management and high speed processing of large amounts of data,

such as that required by this project.

The core of the WRON facility is a 20 unit cluster consisting of 9th generation servers with 2 x Xeon 64 bit dual-core

CPUs2 coupled with 4 gigabytes3 (Gb) of RAM4 each. Each cluster unit has direct high speed access to up to 50

terabytes5 (Tb) of storage via Qlogic 4 Gb Fibre Channel HBA6 cards. A Hitachi AMS7 1000 Tagma Storage Area

Network provides the current 50 Tb of storage. The AMS provides both Network Attached Storage and Storage Area

Network in a flexible system that can be reconfigured easily to allocate storage to the sub-systems. A 15 Tb Tape robot

manages archiving, backup and data transport. Finally a clustered web front-end provides significant capability to deliver

standards-based web services through both open and secure channels. This component was required for the delivery of

certain project tools (including a SharePoint PartnerPortal8) to project teams.

The facility is housed in a purpose built server room providing stable temperature, humidity and power environment for

the high density rack systems. The facility provides 230A@240V 3 phase and 50 kW of sensible cooling. Logged card

access for entry to and exit from the facility is required. The facility is secured as per the Australian Commonwealth

Defence Signals Directorate ACSI-33 guide lines9 to store and process ‘In-confidence’ classified data, meeting the

stringent physical access, network isolation, authentication, and authorisation requirements. Onsite 24-hour security

guards provide physical security to the facility and are alerted of any after-hours access.

The WRON Computing Facility provides two data storage options: (i) a network accessible file system with 50Tb space;

and (ii) a relational database system (Microsoft SQL Server (Enterprise) 2005). Both were used to support data storage

for the MDBSY project.

A full backup of the entire WRON Data Store server is not possible on a daily or even weekly basis due to the large

volumes of data being stored. Therefore, the backup mechanism employed for the project was a shadow copy, with an

area backup on request approach – that is, data were copied to tape areas in up to 15Tb chunks and backed up at the

user’s request. A differential backup was performed on a weekly basis whereby the tape archive is updated only with

changes that have occurred within the data directories.

1 WRON – Water Resources Observation Network 2 CPU – Central Processing Unit 3 A gigabyte is a million bytes, 106 bytes 4 RAM – Random Access Memory – computer data storage 5 A terabyte is a thousand million bytes, 109 bytes 6 HBA – Host Bus Adapter 7 AMS – Adaptable Modular Storage 8 SharePoint is a Microsoft web-based collaboration tool, particularly for managing document writing and production. 9 http://www.dsd.gov.au/_lib/pdf_doc/ism/ISM_Sep08_unclass.pdf

2 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Figure 2-1. The high-level architecture of the WRON Computing Facility in 2007

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 3

3 Data management

In order to control and maintain the integrity of the project data a set of data management protocols were implemented.

For each of the four key project teams (Catchment Yield, River Modeling, Groundwater, Water Accounting and

Environmental Assessment) a Data Coordinator was nominated. It was this person’s role to ensure that the data

management protocols were being applied within the team. The data coordinators were data custodians for their

respective teams, and had responsibility for ensuring that all project data, software, code, maps, and report elements,

were archived in an appropriate location within the project archive. The MDBSY Project data management team included

a team leader, a project data manager, the project team data coordinators, and some additional data management

support staff.

The data coordinators were the first point of contact for any data-related problem in their project team. The data

coordinators could then guide teams through the process of solving any data problems. The data archiving process

involved team members saving new data/results in an appropriate staging area within their own working directory and

then notifying their data coordinator that the dataset was ready for upload to the project archive. The data coordinator

would then move the item into the project directory, and populate the record in the metadata catalogue.

3.1 Data permissions

The large storage volumes associated with the project inhibited the use of folder-level permissions for data security. A

set of permission ‘groups’ were therefore created in order to manage access to data on the WRON server, and to enable

specific access to the MDBSY project archive. Access to the WRON server was only available to CSIRO internal team

members. The permission groups were created to control data updates, editing, directory structure, versioning, etc. Read

access to project data was provided to all internal project staff. Higher permission levels required justification before they

were approved and applied. There were four permission group levels as follows:

• MDB_Storage_Administrator (full administrative control)

• MDB_Storage_Editor (can create/delete folders and files but cannot change permissions and ownership)

• MDB_Storage_Contributor (can read/write/create and modify files, but cannot delete or create folders)

• MDB_Storage_Reader (read and execute access to data)

3.2 Project data archive and workspaces

It was identified early in the project that in order to maintain data integrity it was essential that a project data archive

should be kept separate from working areas. The archive provided an organised set of data directories which enabled

the development of a structured audit trail for all project outputs. The project archive was created on the WRON facility

(ref Section 2) and was only accessible by CSIRO internal staff.

The MDBSY Project Archive contained directories for the 18 MDBSY reporting regions, an additional folder for the

Snowy region (99_Snowy), which was included in the Murrumbidgee and Murray reporting, and another directory for

whole-of-MDB data. Within each of the reporting region directories a directory for each project team was created Figure

3-1 illustrates the structure of the project archive directories.

The structure below each project team varied depending on the tasks being performed, and/or the number of models

being applied in the reporting region. The modelling directories for each project are outlined in the sections describing

each team’s structure. The modelling performed for each region is described in the regional reports and a broad outline

of the methods applied can be found in the ‘Overview of Project Methods’ report.

Individual datasets were contained within their own directory. Each dataset directory was designated with an underscore,

(‘_’) as the directory name prefix (eg. ‘_datasetname’). This allowed for identification of datasets by automatic metadata

tools.

4 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

In some cases an individual dataset contained thousands of files. It was not deemed necessary nor efficient, to describe

(via a metadata statement) each data file stored. Therefore metadata statements described the dataset as a whole,

encompassing all files contributing to that data set.

Figure 3-1. Project archive directory structure

Personal work space was provided for project team staff to develop data, carry out model runs, create maps, develop

report spreadsheets, and to write documents. Data development was carried out on a separate server within the WRON

facility, away from the project archive directory. This was a smaller server with approximately 4 Tb of available storage.

Two directories located on the workspace server related to the MDBSY project. These were named 'Work' and 'dat'.

‘Work’ was used for individual project staff to develop data and to run models, prior to migration across to the Project

Archive. All final datasets, model runs, etc., were required to be moved from here into the project directory onto the data

archive with a metadata record fully populated for each dataset.

The ‘dat’ directory held some key datasets used as a basis for further data development, modelling inputs, and for the

creation of reference maps. This structure was utilised prior to the main Storage Access Network storage coming online

in the WRON server midway through 2007. The core datasets stored in ‘dat’ were later migrated across to the GIS

directory on the data archive for ongoing use as reference datasets.

Some modelling work required very large volumes of storage in order to run multiple scenarios using, and generating,

many thousands of files. To support this, a separate volume was created, which provided a further 9.5 Tb of storage.

Much of the river modelling and catchment yield rainfall runoff modelling work was performed on this server. These

datasets were structured to nest within the project archive structure so that the task of transferring the data into the

archive directory, once the modelling was finished, would be simplified, i.e. only one directory needed to be moved

across.

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 5

3.3 Data management within project team

3.3.1 Catchment yield modelling team

The Catchment Yield modelling directory structure was initially difficult to establish, due to the large volumes of data and

the ongoing evolution of modelling methods and choice of scenarios. This team had the greatest requirement in terms of

data storage space. The climate modelling work alone required more than 6 Tb of storage space while being generated,

although the final datasets were approximately half that volume.

The climate modelling was carried out on a separate processing area of the WRON facility. Once complete, the key data

components (inputs and outputs) were transferred into the MDB project directory. The datasets were stored within the

‘Catchment_Yield-Modelling’ directory under each reporting region within ‘CellRunoff’, ‘Flows’, ‘Primate Climate’, and

‘Prime Flows’ directories, with each containing the scenarios created for that reporting region, e.g. A,B,C, and D. Some

reporting regions also had a directory called ‘FCFC’ which contained Forest Cover Flow Change modelling as part of a

scenario D for some regions. Figure 3-2 depicts the Catchment Yield modelling directory for the Murrumbidgee reporting

region.

6 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Figure 3-2. Catchment Yield modelling directory for the Murrumbidgee reporting region

3.3.2 Groundwater modelling team

The Groundwater modelling data consisted of the groundwater modelling and ‘Rainfall Reduction Factor’ (RRF)

modelling. The RRF modelling was developed for the whole of the MDB. The directory structure for the RRF is shown in

Figure 3-3.

Groundwater modelling was not performed for all reporting regions, as some regions did not have groundwater models

and/or were assigned a very low priority ranking, based on factors such as level of development, size of the available

resource, and degree of connectivity between rivers and aquifers relative to other groundwater managements units

across the MDB. In regions where no modelling was performed, a simple assessment was conducted, and therefore no

models were archived.

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 7

The groundwater modelling did not conform to the surface water reporting region boundaries, however the modelling

results have been reported for each reporting region and the appropriate files (inputs and results) are stored with each

relevant reporting region directory as shown in Figure 3-4.

Figure 3-3. Groundwater modelling directory structure for Rainfall Reduction Factor modelling

Figure 3-4. Groundwater modelling directory structure for the Namoi reporting region

8 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

3.3.3 River modelling team

There were various river models applied to reporting regions, e.g. IQQM, REALM, PRIDE, MSM BIGMOD, etc. The river

modeling work was organized within scenario directories. These conformed to a common multi-tiered naming convention

regardless of which model was applied. The following naming convention for model system and output files for the river

modeling work was used:

CCCC_SSS_%%_VN_GW

where:

• CCCC is the catchment that is modeled e.g. PARO, WARR, NEBI, BORD, GWYD, PEEL, NAMO, MACQ,

CAST, MARR, MART, BOGA, LACH, SNWO, UMUR, MURR, DARL, GSM, OVEN, AVOC, WIMM and MURR.

• SSS is the scenario A0__, AN_, AP_, B0_, BN_, BP_, COH_, COM_, CNH_, CNM_, CPH, CPM_, CPL, DOH,

DOM, DNH, DNM, DPH, DPM, DPL, and PP.

• %% is the percentile 05, 50, 95 and 00 where there is no percentile e.g. AP_, CPH, etc.

• VN is the version number e.g. V01, V02, etc.

• GW indicates that GW interaction has been included.

Figure 3-5 is an example of the directory structure for the Warrego reporting region.

Figure 3-5. River Modelling directory structure for the Warrego reporting region

The prime flows and prime climate, and in some cases prime diversions, data inputs for each region were also stored

within the modelling results in separate directories, as shown in figure 7. Some regions had numerous models while

some only had 1 model. The different models were stored in directories with abbreviated names, e.g. WARR for Warrego,

with the scenario folders stored within them.

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 9

3.3.4 Water accounting and environmental assessment team

The environmental assessment work comprised a small component of the MDBSY as a complete assessment was

beyond the terms of reference. The assessment work focused on environmental flows, assets, indicators, and water

accounting, with some assessment of uncertainty. Figure 3-6 gives an example of the typical directory structure for the

reporting assessment data directory. The assessment team data was mostly stored within the reporting database and

thus utilised very little storage space on the WRON server. Results and summary tables were then generated from the

reporting database.

Figure 3-6. Water accounting and environmental assessment directory structure

3.3.5 Reporting team

The reporting team directory included a structure for storing report elements and final reports, as well as a separate

structure for storing the environmental assessment and uncertainty analysis. The final reports needed to remain

confidential until they were released to the public and were therefore stored within the products portal web site (see

Section 3.1). The reporting elements were embargoed from most project staff until reports were made public, in order to

ensure confidentiality. The reporting team and other key members of project teams were the only staff having access to

the products portal.

The data components for the reporting team were the elements going into reports, which were compiled from the

modelling results by the project teams. The reporting team needed to access the report elements, which included tables,

figures, text blocks, and maps. Therefore, each project team had sub-directories where they saved the elements for the

reporting team to access. The reporting team also had a directory structure which mirrored the reporting directories in the

other team directories. Figure 3-7 illustrates the reporting directory structure.

10 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Figure 3-7. Reporting directory structure stored in project archive

Data coordinators were responsible for ensuring that elements were stored within a ‘_Results’ directory within their

respective project team reporting directory. The protocol established required that any updates of an element were to

supersede the previous version within the results directory, with a new version number included in the name. This did not

affect the metadata statement however, as the metadata entry contained a description of the entire results ‘dataset’

directory. Figure 10 illustrates the reporting directory structure contained within project teams.

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 11

Figure 3-8. Reporting directory structure within project teams

3.4 Data licensing

Most external data and models sourced for the project required a data licence that had been signed by the licensor and

CSIRO (the licensee). However, some datasets were publicly available, e.g. Directory of Important Wetlands of Australia

(DIWA) data. Such datasets still include data agreements which outlined the conditions of use for the data. It was the

responsibility of project managers to ensure that licence conditions are adhered to.

3.5 Data standards

While project teams had specific requirements for data standards, relating to the models being applied, some common

data standards were also established across the project teams. Where some standard software products were employed

it was necessary to ensure that such software could access and read data, e.g. ArcGIS software currently cannot find

data when a path name has a space in it. A key standard was that directory names could not have spaces and so an

underscore was used instead, e.g. 02_Warrego. In order for the metadata tool to be able to identify a directory as being a

dataset (see Section 3.3), the directory named was prefixed with an underscore, e.g. _WARR_AP_00_V01. Importantly,

some teams were creating data as inputs for other teams. Adherence to common data formats and coordinate systems

were necessary in these cases.

12 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

3.5.1 Data formats

The ArcGIS software suite was used to develop GIS data layers and to create maps throughout the project. The standard

format for GIS data used in the project was the ESRI shapefile format.

Comma delimited files (.csv) were most commonly used for both input and output formats for model runs. Some models

had formats specific to the modelling software and so there was some conversions carried out as post-processing

operations, such as for the IQQM surface water modelling.

It is interesting to note that many of the file formats used in the project (.csv in particular) are relatively inefficient when it

comes to use of storage space. For most projects this is not usually a concern as the number of files is relatively small

and the ease with which they can be manipulated far outweighs any inefficiencies. However, in a project the size of the

MDBSY, inefficient file formats do become an issue. For example, at one point during project it was estimated that the

commas within CSV files were consuming a terabyte of storage alone. At the time, there were considerable constraints

on storage space with an ever increasing demand for space in contrast, it was found that storage of the data within a

relational database, reduce storage space requirement considerably. For this reason, and many other reasons, such

technologies should be seriously concerned for future projects.

3.5.2 Naming conventions

Naming conventions were mostly dependent upon requirements for model inputs within each project team, such for the

river modelling scenario directory names as previously mentioned. There was adherence to a naming convention

required for the reporting elements, so that the reporting team could determine which elements were encompassed by a

particular excel spreadsheet or text document.

The reporting elements had a naming convention which encompassed the reporting region, the report chapter, the

element number for the report, and the version of the element to account for updates if they occur. For example

‘02_SW3_37_v10.xls’ refers to the surface water results for the Warrego region for elements between 3 and 37, with this

being version 10 of the results. The version reference was critical as there were often small changes made to

spreadsheets by reporting team members, as well as version updates being requested from project teams.

3.5.3 Coordinate systems

The standard projections used for the spatial data in the project were:

• Geographic, GDA94, based on the SILO data – used for data processing

• Lambert GDA94 MDBC Standard Projection – used for mapping

The Lamberts projection was the preference for mapping as there was minimal distortion of shape on the regions with projected data compared with geographic coordinate data.

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 13

4 Data management tools

4.1 SharePoint web sites

4.1.1 MDB Partner portal

The project SharePoint web site, called the ‘MDB Partner Portal’, was established as a location for sharing documents

and small data volumes of data; as a collaborative tool for centralising communications; and for the dissemination of

information such as key dates, announcements, and links to other relevant web sites. A number of other ‘portals’ were

also available via this site, i.e. the reporting Products portal, Technical Review Panel (TRP), Steering Committee (SC),

and External Review Panel (ERP).

The portal served a vital function by providing a central location for information which helped minimise the volume of

emails and phone calls directed at project managers. The Partner portal contained a ‘Shared Documents ’ directory

housing various planning documents, letters, and reports, which provided a location where the various plans and

decisions, made throughout the project, can be referenced. This was a vital link in the project audit trail, and the

documents were also to be saved within the final project data archive to provide further background to methods.

There was also a ‘Project ’ directory which was used to exchange data with organisations/ subcontractors that were

external to CSIRO. This site could not handle the exchange of large datasets, nor large volumes of data, so an FTP10 site

was utilised for the data exchange tasks (see Section 3.2).

The Partner portal was not used to store project data beyond two days. If data were left in the ‘Project’ directory on the

portal beyond this time it was at the user’s risk and may well have been deleted without notice. Users were required to

communicate with the person with whom they were transferring data to ensure that it had been acquired within that time

period.

The positive results achieved through the use of a SharePoint web site for MDBSY has demonstrated the value of this

tool. The project staff members recognised the synergy created through the use of these sites and this is having a

significant influence on the development of new projects and how they are organised. Many major projects, such as the

three Sustainable Yields extension projects, are now establishing SharePoint sites from the beginning of a project, and

are utilising them as the central repository for project documentation and organisational announcement. The MDB

Partner Portal web page is shown in Figure 4-1.

10 FTP – File Transfer Protocol

14 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Figure 4-1. MDBSY Partner Portal SharePoint web page

4.1.2 Reporting Products portal

As mentioned in Section 2.7, reports were embargoed prior to public release. The SharePoint site restricted access to

only those project staff involved in the report generation and reviewing process. A directory structure was created on this

site which mirrored the project archive directory structure. This was provided in order to allow project team report

elements to be embargoed in an organised structure. This site also provided tools for listing key events, such as

deadlines for the project reports. Figure 4-2 provides a typical view of the products portal home page during the project.

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 15

Figure 4-2. Reporting portal home page

4.1.3 Review Panel portals

The Technical Reference Panel (TRP), Steering Committee (SC), and External Review Panel (ERP) also had SharePoint

web sites. These were vital to the success of the reviewing process as they provided timelines for review and public

release of reports, and allowed reviewers from different organisations to access reports to be reviewed, which usually in

exceeded 20 people per report. The sites also provided transparency for partners in that they could identify where a

report was in the review process, as well as providing version control, and enabling files to be easily exchanged with

relevant project staff. The sites also provided security for the reports as the access was restricted to reviewers and

project management and reporting staff.

Figure 4-3 shows the site links on the ‘MDB Partner Portal’ which include links to the review panel sites. Figure 4-4shows

the TRP site with some typical report announcements.

16 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Figure 4-3. Additional SharePoint portals used in the MDBSY project

Figure 4-4. Technical Reference Panel portal home page

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 17

4.2 External data transfer via FTP

It was necessary to exchange data with external organisations in order to develop data and models for the project.

External project partners were not able to directly access the WRON facility due to the firewall protection. Originally the

project partner portal was used to fulfil this requirement; however, the portal was not designed to handle the large

volumes of data being transferred. An FTP site was therefore established for the project to provide this functionality.

The FTP site was a mirror of the project data store and was only to be used for transfer of data externally. As for the

team areas on the Partner Portal, it was not used to store data beyond a couple of days. The home page of the ‘MDB

Partner Portal’ provided a link to for access to the FTP software download, as well as a ‘How to…’ video demonstration

for using the software, and login details for users. The FTP directories were not physically located on the WRON server,

and could only be accessed via the smart FTP client software shown in Figure 4-5.

It was critical to have communication between the project staff exchanging data, so that the data was copied from the

FTP directories into the desired location as soon as it had been transferred to reduce the risk of it being deleted or

otherwise compromised. An automated function was investigated but never implemented due to time and resource

constraints.

The FTP tool enabled rapid exchange of large data volumes and therefore saved a significant amount of time where data

would otherwise have been written to tape, or DVD, media and sent via post. Figure 4-5 shows the FTP ‘SmartFTP

Client’ software interface with the project FTP site directory listed.

Figure 4-5. SmartFTP client software interface

4.3 Metadata and the metadata entry tool

In the early stages of the project, no standalone tool existed for entering or storing metadata. A metadata template

derived from the ANZLIC Version 2 metadata standard was developed and saved in a Microsoft Word document for

distribution to project teams. A copy of the metadata elements captured can be found in Appendix B.

18 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Metadata was initially entered within a metadata word document with the document title matching the dataset name for

each dataset, and saved within the data directory. This was a stop-gap approach employed while an online metadata

catalogue tool was developed.

Metadata statements were written to encapsulate all components within the dataset directory. In many cases this

amounted to thousands of files and so the description needed to provide enough detail to describe how the contents of

the dataset directory were utilised.

A web-based metadata catalogue was developed half way through 2007 which utilised an abbreviated form of the

metadata template originally developed. The catalogue employs a web-based GUI11 which interfaces with a relational

database, where the metadata is stored. The tool allows users to describe the dataset once it is stored within the project

directory structure.

As noted previously, datasets were identified by prefacing the dataset directory with an underscore character (ie “_”).

This allowed the metadata tool to ‘recognise’ the directory as a dataset. When a new dataset directory was created on

the project archive, a metadata record for the dataset was automatically created in the metadata database. At this stage

the record contained null fields and required metadata details to be entered via the web-based GUI.

Login

The metadata tool was web-based. Users enter via an authentication page (Figure 4-6) allowing them access to the main

site. The first page that appears when you access this link is the user login page, shown in Figure 4-6. Authentication

was role-based with the following roles: admin; editor; and reader.

Figure 4-6. Login page of the metadata entry tool

Locating records

Once logged into the tool the home page appears (Figure 4-7). This page consists of a search tool on the left panel and

a welcome page which specifies functionality and any updates to the tool. The search engine can be used to find a

dataset requiring metadata or to find an existing metadata record and associated dataset. The search pane contains

various contextual options, based on the fields contained within the metadata form, which can help to locate a dataset

record.

11 GUI – Graphical User Interface

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 19

Figure 4-7. Home page of the metadata catalogue

Dataset identification

If a dataset had no metadata then it was shown as a directory path in the search results list. If a metadata entry had been

made, the title of the dataset appeared in the list. Figure 4-8 shows a list of datasets with a completed metadata entry.

The right pane is always split between the metadata record at top and an editing interface at the bottom. If the record is

not being edited then the bottom pane will say ‘Not Currently Editing’. In order to enter metadata or update an existing

metadata entry, the ‘Edit Metadata’ option in the tasks list at the top of the form was selected.

20 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Figure 4-8. Example of a metadata page in the metadata entry tool

The metadata editing form is shown in Figure 4-9. There is an option at the top of the editing pane called ‘copy metadata

from viewed dataset’. This tool was particularly useful for cloning metadata entries in situations where datasets were only

slightly different.

The initial information entered into the form are the descriptive details including: dataset title; a description of the data;

custodian (i.e. the organisation which owns the data); whether or not a data licence has been obtained; custodian contact

details; the MDBSY reporting area; project team which the data relates to; and an abstract and additional metadata notes

which may be required for very detailed datasets.

Figure 4-9. Example of the editing interface of the metadata entry tool

Data lineage

The second set of details requiring entry is those associated with data lineage. This includes

• status of the dataset

• identification of any other datasets which were inputs to creating the current dataset

• processing steps

• tools used

• list of parameters.

When entering lineage information, the status of the dataset must be selected:

• original

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 21

• intermediate

• for external review

• final

Unless the dataset was original the user must select all datasets which were used as inputs into the creation of the

current dataset. This requirement ensures that a lineage is being defined for all project data.

Datasets which need to be selected as inputs can be searched in the left pane while the current record is still being

edited, then viewed and selected using the ‘Add Currently Viewed Dataset’ option shown in Figure 4-10.

Once all information has been entered in the form it is saved using the submit button at the bottom of the page. Any

metadata record can still be reopened for further editing, but if a dataset which has existing metadata is moved or

renamed on the WRON server, the metadata record will be erased by the tool.

Figure 4-10. Lineage data input screen in the metadata entry tool

The metadata tool was critical to the establishment of an audit trail for all project results. All datasets, models, maps, and

report elements that were stored within the project archive were required to have a metadata statement describing them.

The metadata tool lineage functionality required users to define linkages between datasets and therefore establish a

clear path from original input data through to results and then final reports.

4.4 Reporting database

The use of a relational database of processing outputs was investigated to support the generation of project reports, as

well as to provide confidence in the data audit trail. The content of this database was to be such that automatic

generation of a number of the result elements for final reports was possible.

Each Modelling Team was to be responsible for determining the point in their processing at which outputs would be

loaded into the database. The role of the Data Management Team was to:

• develop and implement a single data model to hold this data

• develop tools to load the database from data files provided by modelling teams

• develop query tools for the generation of final indicators for reporting

• implement report templates for generation of reports.

22 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

The intention was that as much of the final reports as possible will be directly produced from the database including all

table based information. The hope was to limit the amount of reformatting required to create the final released

documents.

An important aspect of this database was that it would form a vital link in the data audit trail. That is, through this

database, it was to be possible to directly link reported results to original input data. This would be achieved through

capture of some of the final analysis steps in code (queries) as well as storage of links to input data files.

The data audit trail could then be constructed in the following way:

• Reported results come directly from the reporting database and are generated through stored code.

• The database also contains a link to the original data input file for every piece of information stored. These files

must be stored within the project directory structure.

• As these files are stored on the project data directory, they will be required to have an associated metadata

statement.

• Metadata statements include links to datasets used to create the described dataset.

These input datasets will also be stored within the project directory structure and hence must also have metadata

statements.

Development of the Reporting Database proved ambitious given the project timeframes. The River Modelling and Water

Accounting and Assessment Teams were the only teams to participate in the experiment which had some good early

results. A simple data model built around model results was developed, data loaders were built and deployed, data for a

number of reporting regions were loaded and early products were delivered.

4.5 Document management

The ‘TRIM12’ system is used for archiving audit records, i.e. decisions made for processes within project, staffing

documents relating to appointments, expenditure (e.g. purchase of equipment such as desks and stationery), project

events, salaries, and documents relating to data licences and correspondence. A unique number is assigned to each

document and details of the document entered into the TRIM system through a GUI form shown in Figure 4-11. This

system enabled project leaders to keep track of key documents relating to data agreements and decisions related to

data/document workflow within the project.

12 TRIM is a commercial document management tool used within CSIRO.

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 23

Figure 4-11. User interface of TRIM

24 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

5 Data auditing

For the MDBSY project, all data components, ranging from original inputs through to final products, were fully

documented with metadata and legally covered by data licence agreements where appropriate. It is possible that

elements of the project may need to be regenerated to reproduce results – for example, regeneration of modelling results,

maps, and even reports, as well as possibly sharing out of data to other agencies. It is therefore necessary that any

result can be regenerated exactly as it was originally produced with no variation in modelling outputs, and that it is

possible to verify the inputs to models. To meet this requirement it was necessary to ensure that all project data,

modelling software, model parameters, results, and reports were archived within the project archive directory and/or

metadata catalogue.

5.1 Audit trail

In order to meet the requirements outlined above, it was necessary to ensure that there was a complete audit trail for all

project data, software, model parameters, results, and reports. The audit trail relies on the metadata catalogue

(described in Section 3.3), which was developed within CSIRO specifically for the MDBSY project. Data coordinators

within each project team were responsible for ensuring that all datasets were archived in the project directory and that

they all had metadata entered describing their origin, and indicating what other datasets, if any, were used to create them.

The data management protocols and archiving processes were created such that the audit trail could be fully established.

Every effort was made by the data management team, to ensure that all required components were archived. However, it

is possible that some components of the trail have not been accounted for, e.g. where post-processing has occurred with

corrections applied to modelling outputs in spreadsheets based on known errors in modelling parameters.

A complicating factor was the need to source vital details on datasets from numerous project staff in different teams,

some of whom worked for organisations outside of CSIRO. Often several staff were involved in the creation of a single

metadata statement due to the division of work components and/or expertise within teams.

Archiving the large volumes, and wide variety, of datasets, used in and generated by the project, created a number of

challenges. Demarcating the working space and the project archive directories was necessary for maintaining data

integrity – however this posed problems as the data components developed within working directories needed to be

moved into the project archive prior to metadata creation. As discussed in Section 2.2, this was relatively straightforward

for model runs with well defined directory structures. However, where datasets included, for example ArcMap documents

and various assorted GIS layers and/or images, it was difficult to identify files nested away in individual work directories.

In addition, once all of these data components were moved, the pathways within ArcMap documents and model code

were redundant and needed to be updated. It is therefore possible that some small data components have been

overlooked and not archived by project teams.

5.2 Auditing methods

The Metadata Cataloguing Tool developed for the project required that the lineage of all datasets be described, and that

any datasets used as inputs for the development of another dataset be listed within the metadata statement. As

metadata was created for all reports and reporting elements, it should be possible to identify the path by which individual

results were generated data.

The approach taken was to firstly establish that the key modelling datasets for each reporting region were archived. In

many cases these were obvious, although project team members responsible for running the models had to be consulted

to ensure that all relevant files were included. This process was carried out by using the metadata tool to search the

project archive for those models known to have been run in each region. In many cases this highlighted gaps in the

archive which required further data files to be transferred from work space locations. In addition, some post-processing

datasets were uncovered which also were subsequently archived, e.g. IQQM post-processing data.

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 25

The quality of metadata records was also checked to ensure adequate detail had been supplied. Quality control of

metadata was difficult to enforce as it was often unclear if the detail on a specific dataset was enough to describe all files

associated with that dataset. Data lineage quality was less difficult to measure as it was in most cases obvious which

datasets were inputs to or derived from another dataset.

Given the large volume (over a thousand) of datasets involved in the project, a process of randomly selecting metadata

records from key datasets for each region provided an efficient basis for checking the quality of metadata entries.

Feedback to metadata custodians also allowed them to make updates to a range of records which further improved the

quality.

26 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Appendix A MDBSY Datasets

All MDB Project Team data audit

Team No. Datasets No. Files No. Folders Volume (Gbs)

Assessment 205 3,085 286 3.574

Catchment_Yield 264 2,609,812 78,671 2,823.618

Groundwater 93 199,999 262 1,458.540

River Modelling 11,686 323,515 17,701 543.775

Additional data sets/model code 200 10,016,053 102,937 7,300.000

TOTALS 12,448 13,152,464 199,857 12,129.506

Water Accounting and Assessment team data audit

Region Data Context No. Datasets No. Files No. Folders Volume (Gbs)

01_Paroo Environment 1 38 1 0.09

Water Accounting 9 22 11 0.03

02_Warrego Environment 1 79 5 0.30

Water Accounting 11 91 13 0.04

03_Condamine Environment 1 139 5 0.17

Water Accounting 11 165 13 0.05

04_Moonie Environment 1 17 2 0.02

Water Accounting 10 25 12 0.01

05_Briv Environment 1 18 4 0.09

Water Accounting 10 232 12 0.07

06_Gwyd Environment 1 94 8 0.27

Water Accounting 10 163 12 0.07

07_Namoi Environment 1 29 4 0.11

Water Accounting 10 148 12 0.14

08_Macq_Castle Environment 1 6 2 0.24

Water Accounting 10 213 12 0.05

09_Bar_Darl Environment 1 62 1 0.14

Water Accounting 12 50 14 0.02

10_Lachlan Environment 1 24 2 0.54

Water Accounting 10 85 12 0.03

11_Mbidg Environment 1 23 3 0.25

Water Accounting 10 291 12 0.10

12_Murray Environment 1 40 3 0.00

Water Accounting 10 237 14 0.13

13_Ovens Environment 1 29 5 0.14

Water Accounting 11 127 13 0.04

14_Glb_Broken Environment 1 16 2 0.02

Water Accounting 11 118 14 0.09

15_Campaspe Environment 1 20 3 0.04

Water Accounting 11 87 14 0.04

16_Loddon_Avoca Environment 1 27 3 0.04

Water Accounting 11 81 14 0.03

17_Wimmera Environment 1 23 3 0.01

Water Accounting 10 112 12 0.05

18_East_Mt_Lofty Environment 1 0 0 0.00

Water Accounting 10 154 14 0.09

99_Snowy Environment 0 0 0 0.00

Water Accounting 0 0 0 0.00

TOTALS 205 3,085 286 3.57

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 27

Catchment Yield team data audit

Region Data Context No. Datasets No. Files No. Folders Volume (Gbs)

01_Paroo Cell Runoff 1 63,074 581 79.100

Flows 5 16 8 0.008

Prime_Climate 3 15 3 0.055

Prime_Flows 3 6 3 0.007

02_Warrego Cell Runoff 1 134,655 1,051 165.000

Flows 5 17 8 0.013

Prime_Climate 3 11 3 0.076

Prime_Flows 3 10 3 0.016

03_Condamine Cell Runoff 1 242,237 2,409 284.000

Flows 6 2,850 465 2.740

Prime_Climate 3 18 3 0.049

Prime_Flows 4 8 4 0.057

04_Moonie Cell Runoff 1 26,451 476 32.300

Flows 6 395 87 0.351

Prime_Climate 3 8 3 0.026

Prime_Flows 4 8 4 0.008

05_Briv Cell Runoff 1 100,643 5,187 104.000

Flows 6 5,773 987 6.000

Prime_Climate 3 15 3 0.082

Prime_Flows 4 10 4 0.115

06_Gwyd Cell Runoff 1 59,832 3,401 61.300

Flows 6 3,733 646 3.900

Prime_Climate 3 12 3 0.072

Prime_Flows 4 8 4 0.094

07_Namoi Cell Runoff 1 86,810 2,931 93.300

Flows 6 3,051 557 3.430

Prime_Climate 3 30 3 0.099

Prime_Flows 4 21 4 0.207

08_Macq_Castle Cell Runoff 1 156,355 4,709 171.000

Flows 6 5,777 915 5.540

Prime_Climate 3 10 3 0.034

Prime_Flows 4 8 4 0.134

09_Bar_Darl Cell Runoff 1 250,471 660 316.000

Flows 6 627 123 0.588

Prime_Climate 3 10 3 0.039

Prime_Flows 4 8 4 0.005

10_Lachlan Cell Runoff 1 162,427 1,765 194.000

Flows 6 2,032 339 1.990

Prime_Climate 3 8 3 0.036

Prime_Flows 4 8 4 0.047

11_Mbidg Cell Runoff 1 286,145 12,252 267.000

FCFC 2 38,066 594 40.300

Flows 6 4,605 778 4.330

Prime_Climate 3 21 3 0.088

Prime_Flows 4 18 4 0.168

12_Murray Cell Runoff 1 475,758 4,154 576.000

FCFC 2 82,666 235 93.800

Flows 6 2,225 429 2.530

Prime_Climate 3 28 3 0.020

Prime_Flows 4 13 4 0.026

13_Ovens Cell Runoff 1 55,680 5,650 30.600

Flows 6 3,069 537 3.190

Prime_Climate 3 13 3 0.055

Prime_Flows 4 9 4 0.100

14_Glb_Broken Cell Runoff 1 80,338 5,210 64.000

Flows 6 2,862 465 2.730

28 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Region Data Context No. Datasets No. Files No. Folders Volume (Gbs)

Prime_Climate 3 21 3 0.009

Prime_Flows 4 9 4 0.003

15_Campaspe Cell Runoff 1 33,267 3,674 17.700

Flows 4 2,160 357 2.000

Prime_Climate 3 27 3 0.011

Prime_Flows 4 9 4 0.002

16_Loddon_Avoca Cell Runoff 1 68,694 3,290 64.700

Flows 6 1,941 321 1.810

Prime_Climate 3 26 3 0.014

Prime_Flows 4 18 4 0.012

17_Wimmera Cell Runoff 1 88,233 4,294 82.600

Flows 6 2,713 480 2.660

Prime_Climate 3 29 3 0.012

Prime_Flows 4 9 4 0.003

18_East_Mt_Lofty Cell Runoff 1 56,237 6,304 28.100

FCFC 2 3,618 295 3.140

Flows 6 2,824 537 3.040

Prime_Climate 3 12 3 0.049

Prime_Flows 3 8 4 0.032

99_Snowy Cell Runoff 1 10,942 1,370 6.960

Flows 6 24 10 0.006

Prime_Climate 3 11 3 0.001

Prime_Flows 3 6 3 0.002

TOTALS 264 2,609,812 78,671 2,823.618

Groundwater team data audit

Region Data Context No. Datasets No. Files No Folders Volume (Gb')

01_Paroo no model 0 0 0 0.000

02_Warrego no model 0 0 0 0.000

03_Condamine 1 model 7 31 7 1.500

04_Moonie no model 0 0 0 0.000

05_Briv 1 model 8 16 9 9.300

06_Gwyd 1 model 7 14 8 3.140

07_Namoi 3 models 21 89 24 12.900

08_Macq_Castle 1 model 7 14 8 16.800

09_Bar_Darl no model 0 0 0 0.000

10_Lachlan 2 models 14 46 16 48.100

11_Mbidg 2 models 16 45 18 10.600

12_Murray 1 model 8 11 9 46.200

13_Ovens no model 0 0 0 0.000

14_Glb_Broken no model 0 0 0 0.000

15_Campaspe no model 0 0 0 0.000

16_Loddon_Avoca no model 0 0 0 0.000

17_Wimmera no model 0 0 0 0.000

18_East_Mt_Lofty no model 0 0 0 0.000

99_Snowy no model 0 0 0 0.000

ALL_MDB RRF 5 199,733 163 1,310.000

TOTALS 93 199,999 262 1,458.540

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 29

River Modelling team data audit

Region Data Context No. Datasets No. Files No. Folders Volume (Gbs)

01_Paroo 1 IQQM model 21 2,300 78 3.600

Prime_Climate 4 42 4 0.233

Prime_Flows 4 23 4 0.029

APrimeData_csvs 1 9 1 0.011

02_Warrego 1 IQQM model 21 3,666 78 11.000

Prime_Climate 4 48 4 0.351

Prime_Flows 4 24 4 0.037

APrimeData_csvs 1 10 1 0.027

03_Condamine LBON IQQM model 21 1,903 78 7.790

MCON IQQM model 19 4,599 76 13.200

NEBI IQQM model 19 1,347 76 3.630

STGE model 15 371 45 0.457

UCON IQQM model 19 3,591 76 12.200

Prime_Climate 4 220 4 1.070

Prime_Diversions 4 19 4 0.019

Prime_Flows 4 88 4 0.507

APrimeData_csvs 1 40 1 0.056

04_Moonie MOON IQQM model 24 1,829 90 7.960

Prime_Climate 4 44 4 0.138

Prime_Flows 4 22 4 0.022

APrimeData_csvs 1 9 1 0.009

05_Briv BRIV IQQM model 24 4,566 90 14.600

MACB IQQM model 23 2,362 93 2.830

Prime_Climate 4 96 4 0.799

Prime_Flows 4 55 4 0.753

APrimeData_csvs 1 13 1 0.026

06_Gwyd IQQM model 224 76,289 1,591 217.000

Prime_Climate 4 648 5 2.420

Prime_Flows 4 327 4 5.980

APrimeData_csvs 1 9 1 0.024

07_Namoi NAMO IQQM model 24 4,302 90 9.260

PEEL IQQM model 24 2,601 91 3.120

Prime_Climate 4 120 4 0.446

Prime_Flows 4 51 4 0.539

APrimeData_csvs 1 24 1 0.056

08_Macq_Castle MACQ IQQM model 24 5,577 90 11.700

Prime_Climate 4 44 4 0.176

Prime_Flows 4 22 4 0.393

APrimeData_csvs 1 10 1 0.039

09_Bar_Darl Darl IQQM model 21 2,461 86 8.950

Darl1 IQQM model 16 1,232 64 5.100

Darl2 IQQM model 16 1,301 66 5.390

Darl3 IQQM model 16 1,301 66 5.390

Darl4 IQQM model 16 1,301 66 5.390

MENI IQQM model 21 1,506 86 2.340

MEN1 IQQM model 16 1,140 66 1.850

MEN2 IQQM model 16 1,126 66 1.780

MEN3 IQQM model 16 1,126 66 1.780

MEN4 IQQM model 16 1,126 66 1.780

Prime_Climate 4 480 4 2.280

Prime_Flows 4 240 4 0.150

APrimeData_csvs 1 9 1 0.010

10_Lachlan IQQM model 23 3,902 86 9.320

Prime_Climate 4 42 4 0.181

Prime_Flows 4 21 4 0.126

APrimeData_csvs 1 9 1 0.018

30 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Region Data Context No. Datasets No. Files No. Folders Volume (Gbs)

11_Mbidg ACTD REALM model 17 51 17 0.013

ACTW REALM model 21 488 79 0.087

BIDG IQQM model 22 4,604 92 10.900

BID1 IQQM model 16 1,368 64 8.820

BID2 IQQM model 16 1,487 66 9.390

BID3 IQQM model 16 1,487 66 9.390

BID4 IQQM model 16 1,469 66 9.390

SNAT model 16 96 48 0.037

SNO1 model 17 578 85 0.230

SNO2 model 16 560 80 0.217

SNO3 model 16 628 86 0.245

SNO4 model 16 628 86 0.245

SNOW model 21 718 106 0.273

UBID IQQM model 21 1,941 84 2.130

UUBI model 16 833 64 1.000

YCBN model 5 201 20 0.177

Prime_Climate 4 768 4 3.000

Prime_Flows 4 416 4 3.230

APrimeData_csvs 1 28 1 0.083

12_Murray MGWY MSMBIG model 5 367 47 1.040

MLBO MSMBIG model 5 366 47 1.080

MMAC MSMBIG model 5 366 47 1.080

MMOO MSMBIG model 5 347 44 1.050

MNAM MSMBIG model 5 342 44 0.980

MOVE MSMBIG model 5 366 47 0.988

MSNO MSMBIG model 5 374 47 0.920

MUR1 MSMBIG model 17 1,607 160 4.180

MUR2 MSMBIG model 16 1,483 152 3.770

MUR3 MSMBIG model 16 1,460 152 4.110

MUR4 MSMBIG model 16 1,470 152 4.350

MURR MSBIG model 17 1,423 158 4.980

MWAR MSMBIG model 5 346 44 1.000

13_Ovens IQQM model 121 3,224 484 2.720

Prime_Climate 4 48 4 0.263

Prime_Flows 4 24 4 0.280

APrimeData_csvs 1 4 1 0.015

14_Glb_Broken BDLT model 301 1,502 301 0.416

BOIR model 301 7,224 301 2.340

BRNG model 301 1,502 301 0.416

BROK model 301 7,224 301 2.360

CCRL model 301 2,107 301 1.450

CPIR model 301 7,224 301 2.360

CPPR model 301 7,224 301 2.360

DEIR model 301 7,224 301 2.360

DIIR model 301 7,224 301 2.360

ELOD model 301 902 301 0.330

GBSM model 321 21,034 1,284 21.400

GOPR model 301 7,224 301 2.360

LWLW model 301 2,107 301 1.450

MARY model 301 1,203 301 0.379

RCMD model 301 1,503 301 0.381

REIR model 301 7,224 301 2.360

REVA model 301 1,203 301 0.380

RHAR model 301 1,203 301 0.380

ROIR model 301 7,224 301 2.360

RPOV model 301 1,203 301 0.380

RSGL model 301 1,203 301 0.342

RSHL model 301 1,203 301 0.380

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 31

Region Data Context No. Datasets No. Files No. Folders Volume (Gbs)

RUR3 model 301 1,203 301 0.380

RWIR model 301 7,224 301 2.360

SHIR model 301 7,224 301 2.360

TAIR model 301 7,224 301 2.340

TLWP model 301 2,107 301 1.450

TOIR model 301 7,224 301 2.360

TUNG model 301 1,504 301 0.349

UCAS model 301 1,203 301 0.379

UCRU model 301 1,203 301 0.379

USGL model 301 1,203 301 0.379

USHT model 301 1,203 301 0.379

Demands 316 7,601 317 1.080

Prime_Climate 6 1,304 6 0.509

Prime_Flows 5 327 5 0.139

APrimeData_csvs 1 6 1 0.002

15_Campaspe IQQM model 3 32 3 0.005

Prime_Climate 4 1,623 4 0.586

Prime_Flows 4 324 4 0.099

APrimeData_csvs 1 8 1 0.002

16_Loddon_Avoca AVOC IQQM model 16 480 64 1.560

Prime_Climate 4 1,620 4 0.684

Prime_Flows 4 348 4 0.108

APrimeData_csvs 1 8 1 0.003

17_Wimmera IQQM model 16 414 64 0.447

Demands 24 240 24 0.056

Prime_Climate 4 120 4 0.057

Prime_Flows 4 6 4 0.001

APrimeData_csvs 1 10 1 0.002

18_East_Mt_Lofty Summary data 1 22 1 0.010

99_Snowy Listed in Murrumbidgee.

Whole of basin Outputs 1 60 1 0.096

Model Software Various models 12 62 12 0.022

TOTALS 11,686 323,515 17,701 543.775

32 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008

Appendix B Metadata Template

Original Metadata Template

DESCRIPTION Title Name of the data set Directory Path/File Name Please specify a file name(s) only .Path will be allocated by data coordinators when the data

is moved to the new directory structure. Custodian The name of the organisation responsible for creating and maintaining the data set (e.g.

CSIRO Land and Water) Reporting Area The project reporting area within which the data is applicable. Please select from the

following list:

• 1_Paroo • 2_Warrego • 3_Condamine-Balonne • 4_Moonie • 5_Border Rivers • 6_Gwydir • 7_Namoi • 8_ Macquarie-Castlereagh • 9_Barwon-Darling • 10_Lachlan • 11_Murrumbidgee • 12_Murray • 13_Ovens • 14_Goulburn-Broken • 15_Campaspe • 16_Avoca-Loddon • 17_Wimmera • 18_Eastern Mt Lofty Ranges • All_MDB • Other (please specify)

Project Area The project team the work has been produced within/obtained for. Please select from the

following list. If many, select ‘Reporting’ • Groundwater • Catchment Yield • River Modelling • Reporting

Progress Is the dataset a draft version or a final version? Please select from the following list.

• Original • Draft • Final

Coordinate Reference System What coordinate reference system (if applicable) has been used for the data? Please select

from the following list. • Geographic_GDA94 • Lambert Conformal • N/A (not applicable) • Other (please specify)

Format Indicate the format the data is stored in (e.g. ASCII text, ARC/Info coverage), the digital representation used (e.g. point, raster, vector, text) and the software version number (if applicable).

Data Licence Has a licence for this data been obtained for the project? If so, where is it? Contact organisation The name of the organisation responsible for the creation and maintenance of the data set

(possibly same as custodian) Name The name of the person responsible for creating or maintaining the data set Phone Number The phone number of the person responsible for creating or maintaining the data set Email Address The email address of the person responsible for creating or maintaining the data set Abstract A brief and simple summary of the data set content. This field should include:

• the reason for creating/obtaining the data set • the spatial and temporal scales of the data set(if applicable) and • the main features of the data set.

Search words List a number of words which can be used to search a catalogue and find this data set

© CSIRO 2008 Data management for the Murray-Darling Basin Sustainable Yields Project ▪ 33

LINEAGE This information is required to ensure an audit trail exists

Data Input What data sets (if any) have been used to develop the data set? Include version numbers/dates if applicable.

Processing Steps What processing steps have been taken during the process of creating the dataset? Give enough information such that someone else can repeat the processing

Tools What tools were used to process the data (e.g. IQQM model). Give version numbers if applicable.

Parameter List List any parameters and their values that were used in the process. Positional Accuracy How close are the locations of spatial objects in relation to their true

positions on the earth’s surface (if applicable). Attribute Accuracy Do the values assigned to attributes mimic realistic values. This must

include what classification method is used to assign values to data set features, how well the features correspond with the method, and factors influencing attributes.

Logical Consistency Do all objects in the data set have logical relationships or does the data have discrepancies? (e.g. Do all boundaries meet, do polygons close, and are all points labelled?)

ATTRIBUTES This information should be provided for all key attributes in the data sets

Name Name of the attribute Domain What is the set of valid values for this attribute? Describe each code if applicable Description Description of the attribute. Metadata Date The date that this metadata file was created Additional Metadata Any comments that cannot be provided under other headings.

34 ▪ Data management for the Murray-Darling Basin Sustainable Yields Project © CSIRO 2008