OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

15
Power of Information: Rewiring the London Gazette with RDFa Jeni Tennison (TSO) John Sheridan (OPSI) 1

description

This presentation, from OpenTech 2008, outlines the work OPSI and TSO are doing as part of the Power of Information agenda to make the data in the London Gazette available as RDFa.

Transcript of OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

Page 1: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

Power of Information:Rewiring the London Gazette with RDFa

Jeni Tennison (TSO)John Sheridan (OPSI)

1

Page 2: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

showusabetterway.co.uk

councilsagenciescompaniesgovernment

parking chargespolluting factories

new roads

conservation areas

London GazetteUsed under Creative Commons attribution license: Dan4th on flickr

2

Many of you will already be aware of the competition running on showusabetterway.co.uk to reuse public information.

There is one source of public information on what councils, agencies, companies and the government are doing. That can tell you where conservation areas are, what factories have licenses to emit what kinds of pollution, what parking charges are, where new roads are planned and so on.

That source is the London Gazette.

Page 3: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

5407http://www.london-gazette.co.ukNumber 58663 Wednesday 9 April 2008

Registered as a newspaper

Published by Authority

Established 1665

ContentsState/Parliament/Ecclesiastical/Public Finance/

*Transport/5407*Planning/5411Health/Environment/

*Water/5416Agriculture & Fisheries/

*Notices published today

*Energy/5416Post & Telecom./

*Other Notices/5417Competition/

*Corporate Insolvency/5417*Personal Insolvency/5455*Companies & Financial

Regulation/5471*Partnerships/5472Societies Regulation/

*Personal Legal/5473

Transport

Road T raffic ActsLondon Borough of BromleyT H E BR O M L E Y (P R ESC R I B E D R O U T E) (N O . **) ( O N E W A Y)O R D E R 2008.(R I N GS H A L L R O A D , ST P A U L ’S C R A Y)NOTICE IS HEREBY GIVEN that the Council of the LondonBorough of Bromley proposes to make the above-mentioned Orderunder Sections 6 and 124 and Part IV of Schedule 9 of the RoadTraffic Regulation Act 1984, as amended by Section 8 of and Part 1of Schedule 5 to the Local Government Act 1985, sections 63 - 87inclusive and Schedules 6 and 7 of the Road Traffic Act 1991 and allother enabling powers.The effect of the Order would be to introduce One Way working inRingshall Road, St Paul’s Cray in a clockwise direction.Details of prohibitions and exemptions for certain vehicles and personsare contained in the original Order.A copy of the proposed order, of the plans of the scheme and of theCouncil’s statement of reasons for proposing to make the Order canbe inspected during normal office hours on Mondays to Fridays atthe Bromley Civic Centre, Stockwell Close, Bromley, Kent.

ANY person wishing to object to the proposed Order should send astatement in writing of their objection and the grounds thereof to theDirector of Environment and Leisure Services, Civic Centre, StockwellClose, Bromley, BR1 3UH, quoting reference ADE(TP)/RP/T100/501not later than 30th April 2008.Persons objecting to the proposed order should be aware that underthe provisions of the Local Government (Access to Information) Act1985, any comments received in response to this Notice may be opento public inspection.M ark BowenDirector of Legal and Democratic ServicesCivic CentreStockwell CloseBromley BR1 3UH (496133)

London Borough of EnfieldA V A L O N C L O SE , B Y C U L L A H R O A D , C O L O N E L S W A L K ,C R O F T O N W A Y , C U L L O D E N R O A D , D U N R A V E N D R I V E ,D R A P E RS R O A D , F A I R V I E W R O A D , F A R O R N A W A L K , G L E B EA V E N U E , H A NSA R T W A Y , H I G H O A KS, J A Y C R O F T , O A KA V E N U E , R O U N D H E D G E W A Y , T H E R I D G E W A Y , U P L A N DSP A R K R O A D , W I L L I A M C O V E L L C L O SE A N D W O O D R I D G EC L O SE - N E W ‘ A T A N Y T I M E ’ W A I T I N G R EST R I C T I O NSFurther information may be obtained from Traffic and TransportationServices, telephone number 020-8379 3553.

5407http://www.london-gazette.co.ukNumber 58663 Wednesday 9 April 2008

Registered as a newspaper

Published by Authority

Established 1665

ContentsState/Parliament/Ecclesiastical/Public Finance/

*Transport/5407*Planning/5411Health/Environment/

*Water/5416Agriculture & Fisheries/

*Notices published today

*Energy/5416Post & Telecom./

*Other Notices/5417Competition/

*Corporate Insolvency/5417*Personal Insolvency/5455*Companies & Financial

Regulation/5471*Partnerships/5472Societies Regulation/

*Personal Legal/5473

Transport

Road T raffic ActsLondon Borough of BromleyT H E BR O M L E Y (P R ESC R I B E D R O U T E) (N O . **) ( O N E W A Y)O R D E R 2008.(R I N GS H A L L R O A D , ST P A U L ’S C R A Y)NOTICE IS HEREBY GIVEN that the Council of the LondonBorough of Bromley proposes to make the above-mentioned Orderunder Sections 6 and 124 and Part IV of Schedule 9 of the RoadTraffic Regulation Act 1984, as amended by Section 8 of and Part 1of Schedule 5 to the Local Government Act 1985, sections 63 - 87inclusive and Schedules 6 and 7 of the Road Traffic Act 1991 and allother enabling powers.The effect of the Order would be to introduce One Way working inRingshall Road, St Paul’s Cray in a clockwise direction.Details of prohibitions and exemptions for certain vehicles and personsare contained in the original Order.A copy of the proposed order, of the plans of the scheme and of theCouncil’s statement of reasons for proposing to make the Order canbe inspected during normal office hours on Mondays to Fridays atthe Bromley Civic Centre, Stockwell Close, Bromley, Kent.

ANY person wishing to object to the proposed Order should send astatement in writing of their objection and the grounds thereof to theDirector of Environment and Leisure Services, Civic Centre, StockwellClose, Bromley, BR1 3UH, quoting reference ADE(TP)/RP/T100/501not later than 30th April 2008.Persons objecting to the proposed order should be aware that underthe provisions of the Local Government (Access to Information) Act1985, any comments received in response to this Notice may be opento public inspection.M ark BowenDirector of Legal and Democratic ServicesCivic CentreStockwell CloseBromley BR1 3UH (496133)

London Borough of EnfieldA V A L O N C L O SE , B Y C U L L A H R O A D , C O L O N E L S W A L K ,C R O F T O N W A Y , C U L L O D E N R O A D , D U N R A V E N D R I V E ,D R A P E RS R O A D , F A I R V I E W R O A D , F A R O R N A W A L K , G L E B EA V E N U E , H A NSA R T W A Y , H I G H O A KS, J A Y C R O F T , O A KA V E N U E , R O U N D H E D G E W A Y , T H E R I D G E W A Y , U P L A N DSP A R K R O A D , W I L L I A M C O V E L L C L O SE A N D W O O D R I D G EC L O SE - N E W ‘ A T A N Y T I M E ’ W A I T I N G R EST R I C T I O NSFurther information may be obtained from Traffic and TransportationServices, telephone number 020-8379 3553.

5407http://www.london-gazette.co.ukNumber 58663 Wednesday 9 April 2008

Registered as a newspaper

Published by Authority

Established 1665

ContentsState/Parliament/Ecclesiastical/Public Finance/

*Transport/5407*Planning/5411Health/Environment/

*Water/5416Agriculture & Fisheries/

*Notices published today

*Energy/5416Post & Telecom./

*Other Notices/5417Competition/

*Corporate Insolvency/5417*Personal Insolvency/5455*Companies & Financial

Regulation/5471*Partnerships/5472Societies Regulation/

*Personal Legal/5473

Transport

Road T raffic ActsLondon Borough of BromleyT H E BR O M L E Y (P R ESC R I B E D R O U T E) (N O . **) ( O N E W A Y)O R D E R 2008.(R I N GS H A L L R O A D , ST P A U L ’S C R A Y)NOTICE IS HEREBY GIVEN that the Council of the LondonBorough of Bromley proposes to make the above-mentioned Orderunder Sections 6 and 124 and Part IV of Schedule 9 of the RoadTraffic Regulation Act 1984, as amended by Section 8 of and Part 1of Schedule 5 to the Local Government Act 1985, sections 63 - 87inclusive and Schedules 6 and 7 of the Road Traffic Act 1991 and allother enabling powers.The effect of the Order would be to introduce One Way working inRingshall Road, St Paul’s Cray in a clockwise direction.Details of prohibitions and exemptions for certain vehicles and personsare contained in the original Order.A copy of the proposed order, of the plans of the scheme and of theCouncil’s statement of reasons for proposing to make the Order canbe inspected during normal office hours on Mondays to Fridays atthe Bromley Civic Centre, Stockwell Close, Bromley, Kent.

ANY person wishing to object to the proposed Order should send astatement in writing of their objection and the grounds thereof to theDirector of Environment and Leisure Services, Civic Centre, StockwellClose, Bromley, BR1 3UH, quoting reference ADE(TP)/RP/T100/501not later than 30th April 2008.Persons objecting to the proposed order should be aware that underthe provisions of the Local Government (Access to Information) Act1985, any comments received in response to this Notice may be opento public inspection.M ark BowenDirector of Legal and Democratic ServicesCivic CentreStockwell CloseBromley BR1 3UH (496133)

London Borough of EnfieldA V A L O N C L O SE , B Y C U L L A H R O A D , C O L O N E L S W A L K ,C R O F T O N W A Y , C U L L O D E N R O A D , D U N R A V E N D R I V E ,D R A P E RS R O A D , F A I R V I E W R O A D , F A R O R N A W A L K , G L E B EA V E N U E , H A NSA R T W A Y , H I G H O A KS, J A Y C R O F T , O A KA V E N U E , R O U N D H E D G E W A Y , T H E R I D G E W A Y , U P L A N DSP A R K R O A D , W I L L I A M C O V E L L C L O SE A N D W O O D R I D G EC L O SE - N E W ‘ A T A N Y T I M E ’ W A I T I N G R EST R I C T I O NSFurther information may be obtained from Traffic and TransportationServices, telephone number 020-8379 3553.

5407http://www.london-gazette.co.ukNumber 58663 Wednesday 9 April 2008

Registered as a newspaper

Published by Authority

Established 1665

ContentsState/Parliament/Ecclesiastical/Public Finance/

*Transport/5407*Planning/5411Health/Environment/

*Water/5416Agriculture & Fisheries/

*Notices published today

*Energy/5416Post & Telecom./

*Other Notices/5417Competition/

*Corporate Insolvency/5417*Personal Insolvency/5455*Companies & Financial

Regulation/5471*Partnerships/5472Societies Regulation/

*Personal Legal/5473

Transport

Road T raffic ActsLondon Borough of BromleyT H E BR O M L E Y (P R ESC R I B E D R O U T E) (N O . **) ( O N E W A Y)O R D E R 2008.(R I N GS H A L L R O A D , ST P A U L ’S C R A Y)NOTICE IS HEREBY GIVEN that the Council of the LondonBorough of Bromley proposes to make the above-mentioned Orderunder Sections 6 and 124 and Part IV of Schedule 9 of the RoadTraffic Regulation Act 1984, as amended by Section 8 of and Part 1of Schedule 5 to the Local Government Act 1985, sections 63 - 87inclusive and Schedules 6 and 7 of the Road Traffic Act 1991 and allother enabling powers.The effect of the Order would be to introduce One Way working inRingshall Road, St Paul’s Cray in a clockwise direction.Details of prohibitions and exemptions for certain vehicles and personsare contained in the original Order.A copy of the proposed order, of the plans of the scheme and of theCouncil’s statement of reasons for proposing to make the Order canbe inspected during normal office hours on Mondays to Fridays atthe Bromley Civic Centre, Stockwell Close, Bromley, Kent.

ANY person wishing to object to the proposed Order should send astatement in writing of their objection and the grounds thereof to theDirector of Environment and Leisure Services, Civic Centre, StockwellClose, Bromley, BR1 3UH, quoting reference ADE(TP)/RP/T100/501not later than 30th April 2008.Persons objecting to the proposed order should be aware that underthe provisions of the Local Government (Access to Information) Act1985, any comments received in response to this Notice may be opento public inspection.M ark BowenDirector of Legal and Democratic ServicesCivic CentreStockwell CloseBromley BR1 3UH (496133)

London Borough of EnfieldA V A L O N C L O SE , B Y C U L L A H R O A D , C O L O N E L S W A L K ,C R O F T O N W A Y , C U L L O D E N R O A D , D U N R A V E N D R I V E ,D R A P E RS R O A D , F A I R V I E W R O A D , F A R O R N A W A L K , G L E B EA V E N U E , H A NSA R T W A Y , H I G H O A KS, J A Y C R O F T , O A KA V E N U E , R O U N D H E D G E W A Y , T H E R I D G E W A Y , U P L A N DSP A R K R O A D , W I L L I A M C O V E L L C L O SE A N D W O O D R I D G EC L O SE - N E W ‘ A T A N Y T I M E ’ W A I T I N G R EST R I C T I O NSFurther information may be obtained from Traffic and TransportationServices, telephone number 020-8379 3553.

5407http://www.london-gazette.co.ukNumber 58663 Wednesday 9 April 2008

Registered as a newspaper

Published by Authority

Established 1665

ContentsState/Parliament/Ecclesiastical/Public Finance/

*Transport/5407*Planning/5411Health/Environment/

*Water/5416Agriculture & Fisheries/

*Notices published today

*Energy/5416Post & Telecom./

*Other Notices/5417Competition/

*Corporate Insolvency/5417*Personal Insolvency/5455*Companies & Financial

Regulation/5471*Partnerships/5472Societies Regulation/

*Personal Legal/5473

Transport

Road T raffic ActsLondon Borough of BromleyT H E BR O M L E Y (P R ESC R I B E D R O U T E) (N O . **) ( O N E W A Y)O R D E R 2008.(R I N GS H A L L R O A D , ST P A U L ’S C R A Y)NOTICE IS HEREBY GIVEN that the Council of the LondonBorough of Bromley proposes to make the above-mentioned Orderunder Sections 6 and 124 and Part IV of Schedule 9 of the RoadTraffic Regulation Act 1984, as amended by Section 8 of and Part 1of Schedule 5 to the Local Government Act 1985, sections 63 - 87inclusive and Schedules 6 and 7 of the Road Traffic Act 1991 and allother enabling powers.The effect of the Order would be to introduce One Way working inRingshall Road, St Paul’s Cray in a clockwise direction.Details of prohibitions and exemptions for certain vehicles and personsare contained in the original Order.A copy of the proposed order, of the plans of the scheme and of theCouncil’s statement of reasons for proposing to make the Order canbe inspected during normal office hours on Mondays to Fridays atthe Bromley Civic Centre, Stockwell Close, Bromley, Kent.

ANY person wishing to object to the proposed Order should send astatement in writing of their objection and the grounds thereof to theDirector of Environment and Leisure Services, Civic Centre, StockwellClose, Bromley, BR1 3UH, quoting reference ADE(TP)/RP/T100/501not later than 30th April 2008.Persons objecting to the proposed order should be aware that underthe provisions of the Local Government (Access to Information) Act1985, any comments received in response to this Notice may be opento public inspection.M ark BowenDirector of Legal and Democratic ServicesCivic CentreStockwell CloseBromley BR1 3UH (496133)

London Borough of EnfieldA V A L O N C L O SE , B Y C U L L A H R O A D , C O L O N E L S W A L K ,C R O F T O N W A Y , C U L L O D E N R O A D , D U N R A V E N D R I V E ,D R A P E RS R O A D , F A I R V I E W R O A D , F A R O R N A W A L K , G L E B EA V E N U E , H A NSA R T W A Y , H I G H O A KS, J A Y C R O F T , O A KA V E N U E , R O U N D H E D G E W A Y , T H E R I D G E W A Y , U P L A N DSP A R K R O A D , W I L L I A M C O V E L L C L O SE A N D W O O D R I D G EC L O SE - N E W ‘ A T A N Y T I M E ’ W A I T I N G R EST R I C T I O NSFurther information may be obtained from Traffic and TransportationServices, telephone number 020-8379 3553.

3

The London GazetteIt’s a newspaper that’s Published by Authority, which means that the government publishes it, and that its contents carry some weight.The London Gazette was established in 1665 and since then it’s been published pretty much every working day. This particular issue is from the 9th April.There are laws governing what decisions councils, agencies, companies and government itself can make. Many of these laws state that before a decision is made, they must publish a notice in the Gazette to inform members of the public.The notices themselves are arranged in different categories, such as Transport, Planning and Water.Several hundred copies of the London Gazette are printed each day. But it’s also available online...

Page 4: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

viewPDF

HTML

browserecent issuesby category

searchby date

by keyword

4

This is the website for the London Gazette.You can view issues and notices in PDF and (for recent notices) in HTML.You can browse through notices that have been published recently.You can search for notices between particular dates or in particular issues and with particular keywords.It’s a pretty useful website but imagine...

Page 5: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

mash-upanalyse

re-present

re-use

freeusable

Power of Information

Used under Creative Commons attribution license: AMagill on flickr

5

There are lots of ways the Gazette data could be made more useful.

Applications could present the notice information differently, allowing you to search for notices that were of interest to you because they applied to your area. They could notify you by email or feed whenever a new relevant notice was published.

Applications could reuse the data. They could mash it up with maps or timelines. They could analyse it to detect trends.

But to do any of that, the information in the Gazette has to be free. It has to be usable.

Page 6: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

messy HTMLhard to parseunreliableunusable

semantic markupeasy to extract

reliablereusable

6

And at the moment it isn't. Although notices are available online, the underlying code is messy HTML. To reuse the information, you'd have to write a screen-scraper. That would be complicated because HTML is hard to parse and because you'd have to analyse the HTML really closely to work out the commonalities that allow you to pick out information. It's also unreliable because any changes in how the information is presented would mean you'd have to change the scraping code. So it ends up being unusable.

If we had semantic markup, on the other hand, the information would be easy to identify because it would be tagged as important. And it would be reliable because the semantic markup would stay the same even as the site changed. With semantic markup, the site would become reusable.

Page 7: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

RDFa

microformatssimple to embed

core vocabularies

full RDF flexibility

greater overhead

7

One way of adding semantic markup to pages is to use microformats. You can use microformats in your pages simply by using conventional class names and other standard patterns. They're really easy to embed, but by design they only support simple, common semantics such as events or contact details.

RDFa is a new standard (not yet at Recommendation) that uses the same kind of technique, of adding markup in existing pages. But RDFa supports the full flexibility of the RDF data model; you can use it to encode any information you like. However, it does come with a greater overhead.

We chose to use RDFa because we needed its flexibility to express the uncommon semantics we wanted to express, such as relating a piece of legislation to a notice.

Page 8: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

Notice

Issue2008-04-10

58676

Authority

506836 Council

OS:Area

8

This slide shows an example of the kind of information that we might want to expose about a notice. Each notice has a number and is part of an issue. Each issue has a publication date and a number. A notice has an authority, which may be a council responsible for a particular administrative area.

The concepts and relationships here are specific to Gazettes: there's no microformat to describe the relationship between a notice and the authority that created it or a council and the area it manages. RDFa allows us to express these things.

Page 9: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

ontologiesre-use inventless work more appropriateeasier to reuse easier to generate

links

URIslocators identifiers

http://www.london-gazette.gov.uk/issues/2008-04-10

http://www.london-gazette.gov.uk/issues/2008-04-10/notices/506836

9

To use RDF, you need good URLs that can act as identifiers rather than simply being locators. These are examples of the ones that we're using for issues and for notices.

You also really have to have an ontology of some sort that defines the kinds of resources you want to talk about and how they relate to each other.

There are two paths that you can choose: you can reuse existing ontologies, or you can create your own. Reusing existing ontologies is obviously less work, in that you don't have to create them yourself, but rolling your own gives you semantics that are more appropriate for your application. Reusing means that your data is easier to reuse: if you as a re-user already have code that understands FOAF, you can use that code on any FOAF. But rolling your own means that the RDFa is easier to generate, because the semantics will gel with the semantics of your underlying representation. In our case, the notices are held in a markup language which was designed to represent people's names, addresses and so on in a particular way.

So reuse where necessary, but where you can't, don't be afraid to roll your own. But if you do roll your own, then link it into standard ontologies, where the classes and properties match. State that things are equivalent.

Page 10: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

<head profile="http://www.w3.org/1999/xhtml/vocab">

<base href="/issues/2008-04-10/notices/506836" />

<meta property="dc:creator" content="TSO (The Stationery Office)" />

<link rel="g:isInIssue" href="/issues/2008-04-10" />

<meta about="/issues/2008-04-10" property="dcterms:issued" content="2008-04-10" datatype="xsd:date" />...

10Time for some angle brackets to illustrate how you actually use RDFa. This is an example from a notice page.

The profile tells a processor that we're using RDFa. There should be a GRDDL transformation at the end of that profile. There isn't at the moment, so we actually have our own GRDDL transformation which isn't shown here.

The base URI works as normal in HTML, but it gives the proper identifier for the page, which is useful when you expose the same notice at multiple URIs as we do. RDFa uses the base URI to work out what resource you're talking about, so in this case we're talking about "/issues/2008-04-10/notices/506836".

The meta and link elements work very similarly to how they do in normal HTML, but there are a few extensions. First, the meta element uses a property attribute rather than a name attribute; this just gives consistency with the rest of RDFa, but it works in exactly the same way. The property attribute is used to associate data with a resource; here, we're saying that the creator (as defined by Dublin Core) of the notice is TSO.

The link element is exactly as it is in normal HTML, but the rel attribute holds a CURIE (about which more later). Here we're saying that this notice is in the issue /issues/2008-04-10. The relationship is defined in the Gazettes ontology.

The last meta element here shows two more RDFa attributes: about indicates the subject of the statement, and datatype indicates the datatype of the value of a property. So this says that the issue was issued (as defined by Dublin Core) on 4th April 2008, which is a date.

These all show using RDFa in the <head> of a document, but of course you can also use it in content.

Page 11: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

<body typeof="g:RoadTrafficActsNotice">

<p>Publication Date: <span property="g:hasPublicationDate" content="2008-04-07" datatype="xsd:date">Monday, 7 April 2008</span></p>

<p>Notice Code: <span property="g:hasCategoryCode" datatype="xsd:string">1501</span></p>

<p>Road Traffic Acts</p>

<p> <span rel="g:hasAuthority"> <span typeof="g:Authority org:LondonBoroughCouncil" about="[c:WalthamForestLondonBoroughCouncil]" property="g:isKnownAs">London Borough of Waltham Forest</span> </span></p>

11

Here's the document body. The typeof attribute indicates what category or class the thing we're talking about belongs to. So this says that the notice is a RoadTrafficActsNotice (as defined by the Gazettes ontology).

In the rest of the content shown here, the <span> elements are used to indicate the semantics of particular pieces of text. There's no reason that these have to be <span> elements, they could be anything.

The first span here indicates the publication date of the notice. You can see that the human-readable date appears in the content of the element, and is therefore seen by the user, while the machine-readable date appears in the content attribute.

The second span just gives the category code which is used by the Gazettes to indicate that this is a Road Traffic Acts notice. The human-readable content here is the same as the computer-readable one.

The last set of semantics here is the most complicated. It says that a notice has a particular authority, and then that the authority is a London Borough Council, specifically WalthamForestLondonBoroughCouncil, which is represented in an ontology of councils. Further, it says that WalthamForestLondonBoroughCouncil is known as the London Borough of Waltham Forest.

You'll note that the about attribute here has square brackets in it. This indicates that it's a CURIE rather than a URI. A CURIE is an abbreviated URI, where the bit before the colon points to another URI which is simply prepended to the bit after the colon. RDFa reuses XML's namespacing mechanism to provide the mapping from a prefix to a URI, and we'll see what that means on the next slide.

Page 12: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

<!DOCTYPE "-//W3C//DTD XHTML+RDFa 1.0//EN" "/dtd/gazette.dtd">

DOCTYPE

<!ENTITY % xhtml-rdfa.dtd PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">%xhtml-rdfa.dtd;<!ATTLIST %html.qname; xmlns:rdf CDATA #FIXED "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc CDATA #FIXED "http://purl.org/dc/elements/1.1/" ...>

DTD

<html xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" ...>

Namespace declarations

12

When you use CURIEs you have to declare the prefixes using namespace declarations, as shown here for rdf and dc. This means that, for example, dc:creator actually means http://purl.org/dc/elements/1.1/creator.

Since DTDs aren't namespace aware, declaring those namespaces actually makes the XHTML invalid, even against the XHTML+RDFa DTD, so you have to have your own DTD, and reference it from the DOCTYPE.

The DTD just references the standard XHTML+RDFa DTD, and adds a bunch of namespace declaration attributes on to the <html> element.

You can then validate against the W3C validator at validator.w3.org, or with any XML validator for example.

This overhead is probably the biggest disadvantage of RDFa, but it means that you don't have problems with clashes between naming schemes. I can use dc:title and foaf:title in the same document with no problems.

Page 13: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

headaches

URLsXHTML

markuplicensing

13

In actually implementing this in the London Gazette, we encountered four areas that gave us headaches: the URLs we used; the XHTML we generated; the way the data was marked up; and issues with licensing. I talked about these in detail at XTech. Basically, mapping good URLs onto existing bad ones turns out to be hard work. And putting the processes in place to ensure documents are valid XHTML is hard when the content comes from ASP code, XSLT code and from directly authored text.

More interestingly, exposing the semantics of notices depends on having semantics in the notices, and for these notice types the mark up of semantics in the notices themselves is rather limited.

Finally, there's no point putting this information on the web if people can't use it. The government is working at making the Gazettes data free to use by anyone, even commercially.

Page 14: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

www.london-gazette.gov.uk

final wordsRDFa

more than microformats

London Gazette

www.showusabetterway.co.uk

14

To summarise, I've talked about how RDFa allows us to address other side of the microformats 80/20 split: the 20% of the semantics that takes 80% of the time. It's more work than microformats, but it's also more flexible and more powerful.

I've also talked a bit about the London Gazette and the wealth of information that it holds. Because the RDFa work is still in its early stages, notices from the past year are available in their source XML form; you can find the link from the showusabetterway.co.uk site.

Page 15: OpenTech 2008: Power of Information - Rewiring the London Gazette with RDFa

Acknowledgements

Shaun BiggPaul ApplebyRobin Brattel

Questions?

15