GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy...

12
NATIONAL CENTRE FOR GEOGRAPHIC & RESOURCE ANALYSIS IN PRIMARY HEALTH CARE (GRAPHC) AUSTRALIAN PRIMARY HEALTH CARE RESEARCH INSTITUTE (APHCRI) GTAG SPAWNING DISCUSSION PAPER Paul Konings Michael Hewett

Transcript of GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy...

Page 1: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

NATIONAL CENTRE FOR GEOGRAPHIC & RESOURCE ANALYSIS IN

PRIMARY HEALTH CARE (GRAPHC)

AUSTRALIAN PRIMARY HEALTH CARE RESEARCH INSTITUTE (APHCRI)

GTAG SPAWNING DISCUSSION PAPER

Paul Konings

Michael Hewett

Page 2: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 2 | P a g e

ACKNOWLEDGMENT

National Centre for Geographic & Resource Analysis in Primary Health Care Australian Primary Health Care Research Institute ANU College of Medicine and Health Sciences Bldg 63, Corner of Mills & Eggleston Roads The Australian National University Canberra ACT 0200 T: +61 2 6125 6549 F: +61 2 6125 2254 E: [email protected] W: graphc.aphcri.anu.edu.au

The research reported in this paper is a product of the Australian Primary Health Care Research Institute, which is supported by a grant from the Australian Government Department of Health and Ageing under the Primary

Health Care Research, Evaluation and Development Strategy. The information and opinions contained in it do not necessarily reflect the views or policies of the Australian Government Department of Health.

Page 3: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 3 | P a g e

Contents Definitions: ....................................................................................................................................... 4

GTAG:............................................................................................................................................ 4

ATAG: ............................................................................................................................................ 4

GTAG-ATAG Pair ............................................................................................................................ 4

GTAG Spawning ............................................................................................................................. 4

Spawned GTAGs and ATAGs .......................................................................................................... 4

GTAG Datasets .............................................................................................................................. 4

GTAG User..................................................................................................................................... 4

Data Originator ............................................................................................................................. 4

GTAG Administrator ...................................................................................................................... 4

Introduction ...................................................................................................................................... 5

Background ....................................................................................................................................... 5

Overview........................................................................................................................................... 7

Conditions Of Spawning .................................................................................................................... 8

Spawning Mechanism ....................................................................................................................... 9

Protect or Destroy the links between the datasets with spawned GTAGs and the original GTAG. ... 9

Avoid unnecessary Spawning......................................................................................................... 9

Protect Spawned ATAG/GTAG pairs ............................................................................................... 9

Provide spawned GTAGs only with functionality appropriate to the most sensitive data they are linked

to. ............................................................................................................................................... 10

Business Rules ................................................................................................................................. 10

Database Implementation ............................................................................................................... 10

GTAG Table ................................................................................................................................. 10

Spawning Table ........................................................................................................................... 10

Spawning Database Process ............................................................................................................ 11

Spawn Request ............................................................................................................................ 11

Retire GTAG Request ................................................................................................................... 12

Page 4: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 4 | P a g e

Definitions:

GTAG: A unique identifier assigned to an address registration request to the GTAG system. Every request

receives a unique GTAG, regardless of whether the specific address has been previously registered. The

GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs

cannot be decoded into addresses and will only be usable with functions authorised by a GTAG

Administrator.

ATAG: A unique administrative key generated for every registered GTAG.

GTAG-ATAG Pair A valid combination of GTAG and ATAG. Every GTAG has a single unique ATAG value. The combined pair

allows Administrators to enable and disable GTAG functionality, and to perform other GTAG

administrative tasks.

GTAG Spawning GTAG Spawning is a capability that allows new GTAGs to be generated from existing GTAGs without direct

access to the original addresses associated with the GTAGs. The act of generating a new GTAG/ATAG pair

from an existing GTAG/ATAG pair is functionally equivalent to re-registering the original dataset

addresses from source, and can be performed as often as needed.

Spawned GTAGs and ATAGs GTAGs and ATAGs that are generated from existing GTAG/ATAG pairs rather than directly from addresses.

GTAG Datasets Datasets with one or more records that have a GTAG as one of their member values, but do not contain

an ATAG value. Note that datasets derived from GTAG Spawning require explicit spatial functional

activation in the same way as datasets derived directly from addresses. Spatial activation is achieved

using the paired ATAGs via the G-Tag System. Spawned GTAG datasets do not inherit any spatial

functionality.

GTAG User A user of the G-Tag system that has access to GTAGs, but not the corresponding ATAGs.

Data Originator A data owner or custodian with access to addresses and associated clinical or attribute data. The data

originator is ultimately responsible for the dissemination of data. The G-Tag System facilitates activating

spatial functionality, but the responsibility for data dissemination and spatial enablement, including

delegating GTAG authority to a GTAG Administrator, is vested in the data originator. These can be GPs,

Dentists, Government Departments, or any other person with access to both addresses and clinical or

attribute data

GTAG Administrator A user of the G-Tag system that has access to both GTAGs and the corresponding ATAGs. An

administrator has the ability to change the spatial functionality of GTAGs, and as such should understand:

Page 5: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 5 | P a g e

the issues associated with the protection of privacy and confidentiality

the value and relevance of spatial precision

the nature of the attributes in the dataset and

the potential for re-identification when associating multiple attributes.

GTAG Administrators can be Data Originators or trusted 3rd party delegates designated by a Data

Originator or a GTAG Administrator.

Note that spawned GTAG-ATAG pairs should be seen as equivalent to GTAG-ATAG pairs derived directly

from addresses and as such, a person with access to ATAG/GTAG pairs (regardless of the process from

which the pairs were derived) is a GTAG Administrator with all the associated rights and responsibilities.

This includes determining appropriate spatial functionality, but also if a GTAG Administrator provides

GTAG/ATAG Pairs to another person or organization (thereby delegating administrator responsibilities to

that person or organization), the Administrator is responsible for ensuring that they meet the

requirements of trust and responsibility appropriate to a GTAG Administrator for the level of data the

provided GTAGs are associated with.

Introduction The G-Tag system has been designed to incorporate and use accurate locations, with de-identified unit

record data, without compromising the confidentiality of the de-identified data. The de-identified data

and the location are effectively kept at arm’s length, but answers to location based questions can be

derived. ie:

Geo-linked to socio-economic status / other demographic context data.

Geo-attribution to Region / Administrative area.

How far from a specific location(s) are the records,

Identify spatial clustering, etc.

The principal purpose for the G-Tag System is to offer a means by which research data can be linked to

geographies which are broad enough to protect privacy, but narrow enough to enable useful spatial

analyses, linking to demographic information and spatial visualisation, without individual addresses being

in contact with research data.

The GRAPHC G-Tag system spatially empowers data holdings for all researchers and administrators

without compromising confidentiality.

Background The GRAPHC G-Tag System provides the ways and means to disseminate sensitive unit record data

incorporating relevant and appropriate location attributes. In order for this to be possible, the addresses

must be registered while still associated with the unit record data. Thereafter, the unit record data can be

disseminated and spatially enabled for a variety of uses.

Every registration request generates a new GTAG, so records cannot be cross-linked using GTAGs from

separate registration requests for the same address. Each request generates a GTAG plus an

authorisation key for that GTAG, called an ATAG. GTAGs initially have no spatial value or functionality

until explicitly activated via the G-Tag System using ATAG/GTAG pairs.

Page 6: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 6 | P a g e

The GTAG/ATAG model provides a significant level of control over what spatial precision and functionality

is available/not available to GTAG Users. However, the model means that all users of a GTAG Dataset will

be able to access spatial data and functionality at any and all levels of precision authorised for the GTAG

regardless of the sensitivity of the associated clinical / attributes dataset.

In the event of multiple users of extracted datasets, GTAG Administrators need to be aware of the diverse

dataset disseminations.

For example:

If a dataset with GTAGs holds low sensitivity data (such as BMI) and High sensitivity data (such as AIDs

status), the GTAG should only be authorised to a level appropriate for the most sensitive data. Even if the

GTAG datasets are sub-setted into separate datasets; one with BMI status and another with AIDs status,

the GTAG is common across both deployed datasets. In this case the GTAGs should not be authorised for

a higher precision geography than is appropriate for the AIDs data. Authorising the GTAG in the BMI

dataset will also authorise the same level of capability in the AIDs status dataset and vice versa.

RecordID GTAG BMI AIDs

320 3d4bd2ef-dd50-40e2-a3ea-8dfe893082cb 35.3 YES

RecordID GTAG BMI

320 3d4bd2ef-dd50-40e2-a3ea-8dfe893082cb 35.3

RecordID GTAG AIDs

320 3d4bd2ef-dd50-40e2-a3ea-8dfe893082cb YES

BMI Dataset AIDS Dataset

GTAG Authorisation forGTAG = 3d4bd2ef-dd50-40e2-a3ea-8dfe893082cb

Because both datasets have

the same GTAG,

authorisation affects both

datasets

One way to manage the diversity of appropriate spatial precision is to extract multiple subsets of data

from source (the original data that contain the addresses), then register and GTAG each dataset

separately and authorise each dataset independently.

Ie: Because each registration generates a different set of GTAGs and ATAGs, each set can be managed

and authorised independently.

RecordID Address Suburb State PostCode BMI AIDs

320 13 Jones St Sydney NSW 2000 35.3 YES

RecordID GTAG BMI

320 3d4bd2ef-dd50-40e2-a3ea-8dfe893082cb 35.3

RecordID GTAG AIDs

320 e4be70c3-06b9-4c68-a189-8a4cdff3e990 YES

AIDs datasetBMI Dataset

Page 7: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 7 | P a g e

This approach has some limitations:

requires access to the original addressed dataset

can be difficult to extract equivalent datasets if the original dataset changes over time

may place undesirable demands on the Data Originators.

To better address this issue, GRAPHC has introduced the concept of GTAG spawning.

Spawning Overview The concept of GTAG Spawning is that it is possible to use existing GTAGs as a proxy for addresses for the

purposes of generating new GTAGs (spawning). Accordingly, the spawned GTAGs must have new and

independent ATAGs. This is functionally equivalent to performing multiple registrations of the original

dataset addresses at source, with the advantage of not having to do so physically at source. It is also

important to note that spawning, in as much as it is functionally equivalent to registering GTAGs at

source, requires authority and responsibility that is equivalent to registering GTAGs at source.

Accordingly, only GTAG Administrators may perform GTAG spawning and the data originators should be

aware of the potential multiplicity of use that is inherent in spawning GTAGs. GTAG Administrators are

by definition trusted dataset custodians. Access to ATAGs gives them the capability to authorise any level

of functionality for the associated GTAGs including spawning subsequent GTAGs.

The GTAG Spawning Process accepts valid GTAG/ATAG pairs, and returns new GTAG/ATAG pairs that are

associated with the same locations as the original GTAG/ATAG pairs. The GTAG administrator can then

create a new GTAG dataset by selecting all or part of the original de-identified, attribute dataset and

linking the new GTAGs to the data via the original GTAG. The spawned GTAGs can then be appropriately

authorised, independently of the original GTAGs, allowing different levels of spatial functionality for

different subsets of the GTAG Dataset.

GTAG BMI AIDs

3d4bd2ef-dd50-40e2-a3ea-8dfe893082cb 35.3 YES

Spawn Request 1 forGTAG = 3d4bd2ef-dd50-40e2-a3ea-8dfe893082cb

ATAG = bfff1a10-abf5-4011-9532-cf236d0c81f2

(Low Sensitivity)

GTAG

3d4bd2ef-dd50-40e2-a3ea-8dfe893082cb

ATAG

bfff1a10-abf5-4011-9532-cf236d0c81f2

Original ATAG File Original GTAGged Dataset

Spawn Request 2 forGTAG = 3d4bd2ef-dd50-40e2-a3ea-8dfe893082cb

ATAG = bfff1a10-abf5-4011-9532-cf236d0c81f2

(High Sensitivity)

GTAG BMI

8a5ea82f-4ecf-4f42-b812-7cf2a81fe4f3 35.3

GTAG

8a5ea82f-4ecf-4f42-b812-7cf2a81fe4f3

ATAG

c791938b-d0aa-42e7-bf85-16123285cf4a

Spawn Request 1 ATAG File

Spawn Request 1 GTAGged Dataset

GTAG AIDs

5d2b1f3c-03c9-4819-9c14-708f01f29096 YES

GTAG

5d2b1f3c-03c9-4819-9c14-708f01f29096

ATAG

2e69edb3-be8d-43f3-a35e-f244cf70deaf

Spawn Request 2 ATAG File

Spawn Request 2 GTAGged Dataset

Page 8: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 8 | P a g e

Conditions of Spawning As trusted users, GTAG Administrators should only perform activities that they are authorised to perform

by the Data Originators. This includes allowing or disallowing spawning of the original GTAGs. GRAPHC

cannot determine if a GTAG Administrator has permissions for specific activity by Data Originators, so it

is incumbent on the Data Originators to only provide GTAG/ATAG files to persons they trust to act within

ethical and legal limitations and to adhere to any additional limitations imposed by the Data Originators.

If a GTAG Administrator is authorised to delegate subordinate GTAG Administrators, they must also take

the responsibility for passing those files only to appropriate personnel or organizations.

As with any GTAG Dataset, spawned datasets should only be authorised for functionality that does not

expose the data to undue risk of re-identification by combining the authorised GTAG functionality with

the data content of the records. For example, in the illustration above, it may not be appropriate to

aggregate the data in the original GTAG Dataset to regions smaller than SA3 (for example) because the

record contains AIDs Status data. Even though a user may only be provided with the GTAG and BMI data,

the SA3 limitation still applies for that GTAG because anyone with the GTAG and AIDs Status data must

not be allowed to access location at greater granularity than SA3.

Once the GTAGs have been spawned, and the high sensitivity data has been associated with different

GTAGs to the low sensitivity data, the two sets of GTAGs can be managed independently, with the GTAGs

from spawn request 1 being allowed to access SA1 level data (for example) while the GTAGs from spawn

request 2 are maintained at the SA3 level.

GRAPHC internally tracks the spawning hierarchy of any spawned GTAGs. This is to maintain the absolute

right of a Data Originator or GTAG Administrator to delete any of their GTAGs (or their spawned

children) at any time. If a GTAG is retired from the database, any child GTAGs that have been spawned

from those GTAGs, or spawned from spawns, will also be deleted.

Original GTAG

Spawn Level 1

Spawn Level 2

Spawn Level 3

Spawn Level 4

Deletion Of this Spawned Item

All Red Spawned GTAGs also

Deleted

GRAPHC will not provide information on spawning links except for the immediate links provided during

the Spawning process (ie, submitted GTAG, new GTAG/ATAG). It is the responsibility of the GTAG

Administrator requesting the Spawn operation to properly protect or destroy the returned immediate

links.

Page 9: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 9 | P a g e

Spawning Mechanism GTAGs may be spawned using the GRAPHC GTAG Spawning service. For each GTAG to be spawned, the

service requires a valid GTAG/ATAG pair, and returns the submitted ATAG with a new GTAG/ATAG pair.

Each spawning request generates a new unique GTAG/ATAG pair.

The request generates a response that contains a link between the old and new GTAG (via ATAG) to allow

the spawned GTAGs to be associated with the correct records by using the submitted ATAG as the linking

value. It is strongly recommended that the file or record of the link between the two GTAGs be

disposed of or placed in a secure storage location as soon as possible as it allows for cross linking of

spawned records with the spawning dataset, and between multiple datasets spawned from the same

source dataset. If not appropriately handled, the protection provided by spawning GTAGs for separate

levels of functionality is negated.

Protect or Destroy the links between the datasets with spawned GTAGs and the

original GTAG. In order to protect data with different levels of sensitivity, it is critically important that users cannot cross

link records with different levels of GTAG authority. For example, if a dataset restricted to BMI data is

spawned from a full patient record dataset, it is likely that the BMI dataset will be allowed to be used at a

much finer spatial resolution than the full patient record set. If the BMI records can be linked to the full

patient records (using GTAGs or any other ID field, such as PatientID), it is possible for a user to obtain

detailed location for the BMI dataset then link it back to the full patient record, which has been

authorised only to a much coarser resolution (SA3 for example), and circumvent the protections on that

dataset.

It is the GTAG Administrator’s responsibility to ensure that de-identified datasets cannot be linked or

cross linked with other datasets by users. This is achieved by removing linking fields from the de-

identified dataset. This is true of all GTAG datasets including those from source, linked via patient_id or

spawned, and linked via GTAGs. For the purposes of maintaining privacy and minimising the likelihood of

re- identification, all links from disseminated data back to the original data should be removed before the

de-identified data is passed on.

Because a GTAG Administrator has the ability to authorise any level of functionality, it is acceptable for

administrators to internally track links between originating datasets and spawned datasets for

administrative purposes, but these linkages MUST be protected to at least the same level as the most

sensitive linked GTAG/ATAG pairs, and MUST NOT be used to circumvent data protections implemented

by the separation of more sensitive data from less sensitive data.

Avoid unnecessary Spawning Spawning generates new GTAG/ATAG pairs. Each set of GTAG/ATAG pairs, original or spawned, requires

responsible management and protection. Unnecessary spawning increases the management load and

increases the risk of loss or other inadvertent exposure.

Protect Spawned ATAG/GTAG pairs Spawned ATAG/GTAG pairs are the equivalent of ATAG/GTAG pairs registered from addresses, and as

such require the same level of protection as the un-spawned ATAG/GTAG pairs for equivalent datasets.

Page 10: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 10 | P a g e

Provide spawned GTAGs only with functionality appropriate to the most

sensitive data they are linked to. Just as with any other GTAG, the spatial functionality approved to a GTAG by a GTAG Administrator apply

to all datasets that have that GTAG. As such each GTAG should only be authorised to a level that protects

the most sensitive combination of attributes associated with that GTAG.

Business Rules Every Spawned GTAG is spawned from EXACTLY One GTAG.

Every Spawned GTAG is a Unique GTAG.

Each GTAG can have many Spawned GTAGs

When deleting a GTAG, all descendent Spawned GTAGs will also be deleted.

Database Implementation GTAG Spawning utilises two tables within the GTAG Database.

GTAG Table The GTAG Table is an existing database Table that is used to maintain the GTAG/ATAG pairs that have

been registered with the database.

Spawning Table The Spawn table will be added to the database to allow tracking of the spawning hierarchy for any given

GTAG.

GTAGS

PK GTAG

ATAG

SPAWNING

PK SPAWNED

GTAG

Page 11: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 11 | P a g e

Spawning Database Process

Spawn Request The following is the workflow followed by GRAPHC when spawning.

Spawn Request(GTAG/ATAG)

IS GTAG/ATAG Pair Valid?

Abort

Generate newGTAG/newAtag

pair

Add newGTAG/newATAG to GTAG

table

Add GTAG,newGTAG to SPAWNING table

ReturnATAG, newGTAG, newATAG

N

Y

If the submitted GTAG/ATAG pair are valid

a new GTAG/ATAG pair are generated and recorded in the GTAG table

the submitted GTAG and new GTAG are stored in the SPAWNING table, with SPAWNING.GTAG =

submitted GTAG and SPAWNING.Spawned = new GTAG

A report is returned containing the Submitted ATAG, new GTAG and new ATAG.

Page 12: GTAG Spawning Discussion Paper - graphc.anu.edu.au · GTAG can then be used as an address proxy when requesting services from the GTAG system. GTAGs cannot be decoded into addresses

P Konings, M Hewett, GRAPHC, APHCRI, ANU 12 | P a g e

Retire GTAG Request The following is the workflow followed when retiring a GTAG, and any spawned GTAGs

Retire RequestGTAG/ATAG

Is GTAG/ATAG pair valid

Abort

Retire GTAG(GTAG)

Does GTAG Have Spawned GTAGs

Retire GTAG(SpawnedGTAG)

Y

Delete SpawnedGTAG from

Spawned

Delete GTAG from GTAGs

Stop

N

Retire GTAG(GTAG)

Report Retired

Y

N

If the Submitted GTAG/ATAG pair are valid:

Submit the GTAG to the internal RetireGTAG process

Within the internal RetireGTAG process

o Check if the GTAG has any spawned GTAGs

o For each spawned GTAGs found,

submit the spawned GTAGs to the RetireGTAG process. This will recursively

retire all child GTAGs originating from the spawned gtag.

Delete the identified spawned GTAG from the SPAWNED table.

o Delete the GTAG from the GTAGs table.