The future role of schema.org for business...
Transcript of The future role of schema.org for business...
Mirek Sopek, Robert Trypuz, Dominik Kuziński
The future role of schema.org for business reporting. From existing financial extension toward reporting extension.
WHAT IS SCHEMA.ORG?
• Schema.org (2011), sponsored by the most important search engines: Google, Microsoft, Yahoo and Yandex, is a large scale collaborative activity with a mission to create, maintain, and promote schemas for structured data on the WEB pages and beyond.
• It contains more than 2000 terms: 753 types, 1207 properties and 220 enumerations.
• Schema.org covers entities, relationships between entities and actions.
• Today, about 15 million sites use schema.org. Random yet representative crawls (Web Data Commons) show that about 30% of URLs on the web return some form of triples from schema.org.
• Many applications from Google (Knowledge Graph), Microsoft (like Cortana), Pinterest, Yandex and others already use schema.org to power rich experiences.
• Think of schema.org as a global Vocabulary for the web transcending domain and language barriers.
• The principal authors of the schema.org conceptual framework are R. Guha, D. Brickley and P. Mika
• SCHEMA.ORG is THE MOST IMPORTANT and THE MOST POPULAR global DATA VOCABULARY
WHAT IS SCHEMA.ORG?
http://bl.ocks.org/danbri/raw/1c121ea8bd2189cf411c/
http://schema.org
PRINCIPLE OF LEAST POWER - 1998
• Principle: Powerful languages inhibit information reuse.
• Good Practice: Use the least powerful language suitable for expressing information, constraints or programs on the World Wide Web.
• Tradeoff: Choosing between languages that can solve a broad range of problems and languages in which programs and data are easily analyzed
SCHEMA.ORG is born out of this principle !
SIR TIM BERNERS LEE
SCHEMA.ORG USE SIMPLICITY – AN ILLUSTRATION
http://finances.makolab.com/HTML/LoanStudents/LoanStudents.html
OF SCHEMA.ORG
Under the hood
UNDER THE HOOD OF SCHEMA.ORG
• „The driving factor in the design of Schema.org was to make it easy for webmasters to publish their data. In general, the design decisions place more of the burden on consumers of the markup.” R.V. GUHA, D. DAN BRICKLEY, S. MACBETH – „Schema.org - Evolution of Structured Data on the Web”
DESIGN DECISIONS
• Derived from RDFS (RDF Schema)
• Multiple inheritance hierarchy
• POLYMORPHIC PROPERTIES - Each property may have one or more types as its domain and its range: „domainincludes” and „rangeincludes”
DATA MODEL
UNDER THE HOOD OF SCHEMA.ORG
USAGE MODELS
• Under full control of site/messages/data publishers
• Data EMBEDDED into page, data representation or into message markup (HTML, XML)
• Harvested during standard crawling, message or data processing
SERIALIZATIONS
• RDFa - CANONICAL
• Microdata (native to HTML5)
• JSON-LD (now preffered)
UNDER THE HOOD OF SCHEMA.ORG
CORE http://schema.org/<term> http://schema.org/<term>
HOSTED EXT. http://<ext>.schema.org/<term> http://schema.org/<term>
External EXT. http://<ext.domain>/<term> http://<ext.domain>/<term>
CORE http://schema.org/Car http://schema.org/Car
HOSTED EXT. http://auto.schema.org/Motorcycle http://schema.org/Motorcycle
External EXT. http://fibo.org/voc/BusinessEntity http://fibo.org/voc/BusinessEntity
EXTENSION MECHANISM: RULES FOR URI IDENTIFIERS
Documentation URI: Canonical URI:
Examples:
Rules:
UNDER THE HOOD OF SCHEMA.ORG
<div itemscope itemtype="http://schema.org/BankTransfer">
<h1>If you want to donate</h1>
Send <span itemprop="amount" itemscope itemtype="http://schema.org/MonetaryAmount"> <span itemprop="amount">30</span> <span itemprop="currency" content="USD">$</span> </span>
via bank transfer to the <span itemprop="beneficiaryBank">European ExampleBank, London</span> Put "<i itemprop="name">Donate wikimedia.org</i>" in the transfer title.
</div>
EXAMPLES - MICRODATA
UNDER THE HOOD OF SCHEMA.ORG
<div vocab="http://schema.org" typeof="BankTransfer">
<h1>If you want to donate</h1>
Send <span property="amount" typeof="MonetaryAmount"> <span property="amount">30</span> <span property="currency" content="USD">$</span> </span> via bank transfer to the <span property="beneficiaryBank"> European ExampleBank,London</span>
Put "<i property=’name’>Donate wikimedia.org</i>" in the transfer title. </div>
EXAMPLES - RDFa
UNDER THE HOOD OF SCHEMA.ORG
<script type="application/ld+json">
{"@context": "http://schema.org/", "@type": "BankTransfer", "name": "Donate wikimedia.org",
"amount": { "@type": "MonetaryAmount", "amount": "30", "currency": "USD"
},
"beneficiaryBank": "European ExampleBank, London"} </script>
EXAMPLES – JSON-LD
SCHEMA.ORG VS. ONTOLOGIES AND LINKED DATA
• Common elements: a graph data model of typed entities with named properties
• Schema.org uses RDFS schema language and JSON-LD and RDFa syntaxes
• Schema.org shares (with Linked Data and Ontologies) many of the same goals
• Linked data and ontologies have brought to the Web a much smaller number of data sources than Schema.org, but their quality is (often) very high. This opens up many opportunities for combining the two approaches—for example, professionally published ontologies can often authoritatively describe the entities mentioned in Schema.org descriptions from the wider mainstream Web.
SIMILARITIES DIFFERENCES • Schema.org's approach can be seen as less noisy and more
decentralized than Linked Data • Schema.org promotes syntaxes (JSON-LD RDFa) that are a
tradeoff between machine-friendly and human-friendly formats • Linked RDF data publication practices have not been adopted in
the Web at large • Schema.org shares the Linked-Data community's skepticism
toward the premature ontologies (rule systems, description logics, etc.) found in much of the academic work that is carried out under the Semantic Web banner.
• Schema.org avoids assuming that rule-based processing will be commonplace
• Schema.org’s approach, in contrast to the methodologies of building Linked Data and ontologies, does not assume that various kinds of cleanup, reconciliation, and post-processing will usually be needed before structured data from the Web can be exploited in applications.
• Many frame-based knowledge representation systems, including RDF Schema and OWL have a single domain and range for each relation. Schema.org assumes polymorphism.
• Schema.org allows for multiple inheritance.
OF SCHEMA.ORG
Extensions
UNDER THE HOOD OF SCHEMA.ORG
CORE HOSTED EXTENSIONS EXTERNAL EXTENSIONS
• CORE – „Core, basic vocabulary for describing the kind of entities the most common web applications need”*
• HOSTED/REVIEWED EXTENSIONS – Domain specific basic vocabularies.
• EXTERNAL EXTENSIONS – More specialized, fully independent domain specific vocabularies. Built by a third party.
• Today: autos, finance, bibliography, health & life-sciences, iot
EXTENSION MECHANISM: SEQUENCE OF SPECIFICITY
* http://schema.org/docs/extension.html
CREATING EXTENSIONS TO SCHEMA.ORG
• Extension URI: auto.schema.org
• Designed as the first phase of the GAO project (Generic Automotive Ontology - http://automotive-ontology.org)
• First step: extending core vocabulary by a minimal set of new terms (May 2015)
• Second step: creating auto.schema.org hosted extension (May 2016)
• Third step: creating POC of the external extension (March 2017)
• Fourth step: production grade implementation by Toyota (http://ml.ms/schema4toyota)
• Extension URI: fibo.schema.org
• Inspiration from FIBO project (Financial Industry Business Ontology – http://fibo.org )
• Going through BOC (Bag-Of-Concept) phase and using an „Occam Razor” approach.
• First step: extending core vocabulary by a minimal set of new terms (May 2016)
• Second step: creating fibo.schema.org hosted extension (published in pending.schema.org (March 2017))
• Third step: creating POC of the external extension (March 2017)
AUTOMOTIVE EXTENSION FINANCIAL EXTENSION
Creation process management: MakoLab
AUTO.SCHEMA.ORG
May 13, 2015 – official introduction of the Automotive extension to schema.org Collaborative project of Hepp Research GmbH, MakoLab, and many other individuals. Managed by MakoLab SA
FINANCE.SCHEMA.ORG
Extension of the core vocabulary by a minimal set of new terms (May 2016) The hosted extension (published March 2017) as pending.schema.org Collaborative project of an international group of individuals lead by MakoLab SA. Described in: http://schema.org/docs/financial.html Managed by MakoLab SA
The financial extension of schema.org refers to the most important real world objects related to banks and financial institutions: • A bank and its identification mechanism • A financial product • An offer to the client • Described in:
http://schema.org/docs/financial.html
Thing CLASSES Action
TransferAction MoneyTransfer
Intangible Service
FinancialProduct BankAccount
DepositAccount CurrencyConversionService InvestmentOrDeposit
BrokerageAccount DepositAccount InvestmentFund
LoanOrCredit CreditCard MortgageLoan
PaymentCard + PaymentService
StructuredValue
ExchangeRateSpecification MonetaryAmount RepaymentSpecification
FINANCE.SCHEMA.ORG
FINANCE.SCHEMA.ORG
The financial extension of schema.org refers to the most important real world objects related to banks and financial institutions: • A bank and its identification mechanism • A financial product • An offer to the client • Described in:
http://schema.org/docs/financial.html
A BANK
A DEPOSIT ACCOUNT
A PAYMENT CARD
THE BASIC MODELS OF THE FINANCIAL OBJECTS
FINANCE.SCHEMA.ORG
OF SCHEMA.ORG
The Applications
The Applications of schema.org
Traditional:
RANK ANALYSE SEARCH
New:
REPORT
FUNDAMENTAL TRENDS IN WEB SEARCH
1. BIGGER SHARE ON THE TRANSACTION
• 2. RICHER INTERACTION
This slide is based on the work of M. Hepp & M. Sopek "Web Search and Beyond: Digital Marketing for Automotive"
RANK
4. DYNAMICS AND VOLATILITY
3. STRONGER INDIVIDUALIZATION
FUNDAMENTAL TRENDS IN WEB SEARCH RANK
This slide is based on the work of M. Hepp & M. Sopek "Web Search and Beyond: Digital Marketing for Automotive"
SUMMARY OF “RANK” BENEFITS OF SCHEMA.ORG
• CTR increase (Rich Snippets effect) • Better Brand visibility
(Knowledge Panels and Factual Answers)
• Better Product positioning (Rich snippets & Tabular results)
• Faster way to reach searched content (more sitelinks)
• Better mobile device experience of search
11.09.2015 – Google:
„Over time, I think it [structured markup] is something that might go into the rankings as well. If we can recognize someone is looking for a car, we can say oh well, we have these pages that are marked up with structured data for a car, so probably they are pretty useful in that regard. We don’t have to guess if this page is about a car.” John Mueller / Webmaster Trends Analyst @Google
RANK
SCHEMA.ORG DATA IN GOOGLE ANALYTICS
The markup in the website’s code
• schema.org
Google Tag Manager
• Additional setup
Google Analytics
• Additional Dimensions and Metrics
ANALYSE
FINANCIAL EXTENSION SCHEMA.ORG POC
• http://finances.makolab.com • Full use of fibo.schema.org • Definitions of financial
dimensions • Analytics with Google “GA”
Proof-of-Concept
ANALYSE
POC’s page Json property Dimension Dimension name BankAccount.html price
Bank Account Fee
Price
name Financial Product Name Financial Product Name
BrokerageAccount.html minValue Brokerage Account Minimum Investment
Minimum
name Financial Product Name Financial Product Name
CreditCard.html annualPercentageRate
Credit Card APR Percentage Rate
minValue
Credit Card Required Collateral
Minimum
price Credit Card Annual Fee Price name Financial Product Name Financial Product Name
CreditCard8.html name Financial Product Name Financial Product Name
minValue Credit Card Limit Minimum PaymentService.html name Financial Product Name Financial Product Name
FinancialProducts.html name Financial Product Name Financial Product Name
minValue Minimum Insurence Coverage
Minimum
maxValue Maximum Insurence Coverage
Maximum
FINANCIAL EXTENSION SCHEMA.ORG POC ANALYSE
TRUE DATA ANALYTICS ANALYSE
SCHEMA.ORG DATA IN GOOGLE ANALYTICS
PROS: CONS:
• None. • Analyse additional information available in Schema markup right in Web Analytics.
• Better insights into what people look at on the website. Deeper understanding of users’ needs.
• Better conclusions for website’s UX optimization.
• Better conclusions for campaigns optimization.
ANALYSE
32
INTELLIGENT/SMART SEARCH BASED ON SCHEMA.ORG MARKUP
Mark your product data with schema.org markup
Run the smart Search Crawler for an Enterprise Website
Check for schema.org markup (Microdata or JSON-LD)
When markup is found, create property map and assign values
Display enhanced search results
SEARCH
33
SEARCH AGAINST BOTH CONCEPTS AND THEIR PROPERTIES’ VALUES
The real values taken from existing data found by crawler within the marked website pages
INTELLIGENT/SMART SEARCH BASED ON SCHEMA.ORG MARKUP SEARCH
34
SEARCH AGAINST MULTIPLE CRITERIA
INTELLIGENT/SMART SEARCH BASED ON SCHEMA.ORG MARKUP SEARCH
The Applications of schema.org
From existing financial extension toward reporting extension.
REPORT
THE REPORTING HORIZON
THE BUSINESS REPORTING and BUSINESS INFORMATION EXCHANGE IS REIGNED BY XBRL STANDARD
• However, the cost of filing financial reports is still quite high, particularly for small companies*
• This is why in the US, “Small Company Disclosure Simplification Act” : “(…) exempts emerging growth companies and issuers with total annual gross revenues of less than $250 million from the requirement to use Extensible Business Reporting Language (XBRL) for financial statements and other mandatory periodic reporting filed with the Securities and Exchange Commission (SEC). Such companies, however, may elect to use XBRL for such reporting.”
* $2,000 to $25,000 per year according to XBRL US.
REPORT
THE RELEVANT DEVELOPMENT
THE USE OF SEMANTIC WEB STANDARDS
• „Publishing XBRL as Linked Open Data”
(Roberto García & Rosa Gil, Universitat de Lleida)
• „Triplificating and linking XBRL financial data” (Roberto García & Rosa Gil, Universitat de Lleida)
• „Adopting Semantic Technologies for Effective Corporate Transparency” (Maria Mora-Rodriguez, Ghislain Auguste Atemezing, Chris Preist)
• „Financial Report Ontology” (Charles Hoffman)
• FRO – „Financial Regulation Ontology” (Jurgen Ziemer - http://finregont.com/ )
REPORT
THE RELEVANT DEVELOPMENT
THE EVOLUTION TOWARD SIMPLICITY WITHIN XBRL WORLD
• „Open Information Model” -
https://specifications.xbrl.org/work-product-index-open-information-model-open-information-model.html The Open Information Model provides a syntax-independent model for XBRL data, allowing reliable transformation of XBRL data into other representations. The work product includes: xBRL-XML, xBRL-JSON, xBRL-CSV, OIM Common.
• XBRLS - XBRL Simple Application Profile (how a simpler XBRL can make a better XBRL)
• Inline XBRL - https://specifications.xbrl.org/spec-group-index-inline-xbrl.html
REPORT
* Charles Hoffman: http://xbrl.squarespace.com/journal/2008/12/18/hello-world-xbrl-example.html
HOW COULD IT WORK?
INITIAL EXCERSISE “I”- XBRL „Hello World”* expressed as schema.org compliant markup • Converting taxonomy (XSD) to OWL ontology
(with help of: http://rhizomik.net/html/redefer/)
• Writing schema.org compliant JSON-LD markup
• USING ONLY existing schema.org constructs !!!
REPORT
HOW COULD IT WORK?
INITIAL EXCERSISE “II”- iXBRL example • Based on https://www.xbrl.org/ixbrl-
samples/valeo-income-statement.html
• Expression of the data semantics in JSON-LD – schema.org compliant markup
• Instance of schema.org:Report class
• A few more constructs extending schema.org
REPORT
HOW COULD IT WORK?
INITIAL EXCERSISE “III”- GAAP TAXONOMY IN SCHEMA.ORG FORMAT • Source: PROPOSED 2018 US GAAP FINANCIAL
REPORTING TAXONOMY
• How: Extracting parent-child taxonomy with the definitions of terms + schema.org-like RDFa formatting of the obtained model
• Result: http://sdo-gaap-ee.appspot.com/GrossProfit
HOW COULD IT WORK?
INITIAL EXCERSISE “IV”- PART OF SBR (AU) TAXONOMY IN SCHEMA.ORG FORMAT
• Source: The Australian Government Standard Business Reporting Taxonomy
• How: Extracting parent-child taxonomy with the definitions of terms + schema.org-like RDFa formatting of the obtained model*
• Result: http://sdo-sbr-ee.appspot.com/RelativePeriodDurationDimension http://sdo-sbr-ee.appspot.com/SalesAndMarketing http://sdo-sbr-ee.appspot.com/PrimaryProduction
* With help of Xwand software provided by FQS Poland
Identified Problems within SBR: • mixing types and properties in
one taxonomy • mixing „is-a” and „instance-of” • individuals play a role of types • Bad taxonomies • Unclear rationale for types
Current & near future development CREATION of SCHEMA.ORG extensions and their applications
• Step I – the external extension based on selected XBRL taxonomies (like GAAP or IFRS)
• Step II – the external extension based on selected SBR taxonomy (??? requires serious work on the SBR side)
• Working with OIM specification on the schema.org-like instance data representation • Creation of implementation guidelines and live POC • Working with interested parties on the real-life tests • Critical evaluation of the project • If successful - proposing the HOSTED EXTENSION to schema.org In general - Adopting the philosophy of bottom-up, empirical approach to the creation of reporting data models
REPORT
DON’T HESITATE TO CONTACT US!
Dr MIREK SOPEK MakoLab’s CTO [email protected]
Poland: MakoLab SA, Demokratyczna 46, 93-430 Lodz, Poland Phone: +48 600 814 537, www.makolab.com
USA: Makolab USA Inc, 20 West University Ave, Gainesville, FL 32601 Phone: +1 551 226 5488 , www.makolab.com
Dr ROBERT TRYPUZ MakoLab SA Rzgowska 30 93-172 Łódź Poland [email protected]
INDUSTRY MakoLab SA Rzgowska 30 93-172 Łódź Poland [email protected]
ACADEMIA JPII University Lublin, Poland [email protected]
DOMINIK KUZIŃSKI MakoLab SA Demokratyczna 46 93-430 Łódź Poland [email protected]