Post on 10-May-2015
description
Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers
Pat PattersonPrincipal Developer Evangelist
ppatterson@salesforce.com@metadaddy
Agenda
Foreign Data Wrappers
Writing FDW’s in C
Multicorn
Database.com FDW for PostgreSQL
FDW in action
Why Foreign Data Wrappers?
External data sources look like local tables!– Other SQL database
• MySQL, Oracle, SQL Server, etc
– NoSQL database• CouchDB, Redis, etc
– File
– LDAP
– Web services• Twitter!
Why Foreign Data Wrappers?
Make the database do the work– SELECT syntax
• DISTINCT, ORDER BY etc
– Functions• COUNT(), MIN(), MAX() etc
– JOIN external data to internal tables
– Use standard apps, libraries for data analysis,
reporting
Foreign Data Wrappers
2003 - SQL Management of External Data (SQL/MED)
2011 – PostgreSQL 9.1 implementation– Read-only
– SELECT-clause optimization
– WHERE-clause push-down• Minimize data requested from external source
Future Improvements– JOIN push-down
• Where two foreign tables are in the same server
– Support cursors
FDW’s in PostgreSQL
‘Compiled language’ (C) interface
Implement a set of callbackstypedef struct FdwRoutine{ NodeTag type; /* These functions are required. */ GetForeignRelSize_function GetForeignRelSize; GetForeignPaths_function GetForeignPaths; GetForeignPlan_function GetForeignPlan; ExplainForeignScan_function ExplainForeignScan; BeginForeignScan_function BeginForeignScan; IterateForeignScan_function IterateForeignScan; ReScanForeignScan_function ReScanForeignScan; EndForeignScan_function EndForeignScan; /* These functions are optional. */ AnalyzeForeignTable_function AnalyzeForeignTable;} FdwRoutine;
FDW’s in PostgreSQL
Much work!• CouchDB FDW
• https://github.com/ZhengYang/couchdb_fdw/
• couchdb_fdw.c > 1700 LoC
Multicorn
http://multicorn.org/
PostgreSQL 9.1+ extension
Python framework for FDW’s
Implement two methods…
Multicorn
from multicorn import ForeignDataWrapper
class ConstantForeignDataWrapper(ForeignDataWrapper):
def __init__(self, options, columns): super(ConstantForeignDataWrapper, self).__init__(options, columns) self.columns = columns
def execute(self, quals, columns): for index in range(20): line = {} for column_name in self.columns: line[column_name] = '%s %s' % (column_name, index) yield line
Database.com FDW for PostgreSQL
OAuth login to Database.com / Force.com– Refresh on token expiry
Force.com REST API– SOQL query
• SELECT firstname, lastname FROM Contact
Request thread puts records in Queue, execute()
method gets them from Queue
JSON parsing – skip embedded metadat
< 250 lines code
Demo
Conclusion
Foreign Data Wrappers make the whole world look like
tables!
Writing FDW’s in C is hard!– Or, at least, time consuming!
Writing FDW’s in Python via Multicorn is easy!– Or, at least, quick!
Try it for yourself!
Resources
http://wiki.postgresql.org/wiki/SQL/MED
http://wiki.postgresql.org/wiki/Foreign_data_wrappers
http://multicorn.org/
https://github.com/metadaddy-sfdc/Database.com-FDW-for-
PostgreSQL