PDF Reporting with ReportLab Reporting with ReportLab Andy Robinson CEO / Chief Architect, ReportLab...

62
PDF Reporting with ReportLab Andy Robinson CEO / Chief Architect, ReportLab Inc. Open Source Conference 2000, 19 July 2000

Transcript of PDF Reporting with ReportLab Reporting with ReportLab Andy Robinson CEO / Chief Architect, ReportLab...

PDF Reporting with ReportLab

Andy RobinsonCEO / Chief Architect, ReportLab Inc.

Open Source Conference 2000, 19 July 2000

Smile and look them in the eye!

Part 1 - Background

What's wrong with existing reporting tools?

What's wrong with existing reporting tools?• Almost invariably run on Windows

What's wrong with existing reporting tools?• Almost invariably run on Windows

• Data must come from a relational database

What's wrong with existing reporting tools?• Almost invariably run on Windows

• Data must come from a relational database

• Data must come from ONE relational database

What's wrong with existing reporting tools?• Almost invariably run on Windows

• Data must come from a relational database

• Data must come from ONE relational database

• Slow

What's wrong with existing reporting tools?• Almost invariably run on Windows

• Data must come from a relational database

• Data must come from ONE relational database

• Slow

• Straight to printer

What's wrong with existing reporting tools?• Almost invariably run on Windows

• Data must come from a relational database

• Data must come from ONE relational database

• Slow

• Straight to printer

• No possibility for reuse

What's wrong with existing reporting tools?• Almost invariably run on Windows

• Data must come from a relational database

• Data must come from ONE relational database

• Slow

• Straight to printer

• No possibility for reuse

• Limitations on Formatting

What's wrong with existing reporting tools?• Almost invariably run on Windows

• Data must come from a relational database

• Data must come from ONE relational database

• Slow

• Straight to printer

• No possibility for reuse

• Limitations on Formatting

• Limitations with non-European scripts

What's wrong with existing reporting tools?• Almost invariably run on Windows

• Data must come from a relational database

• Data must come from ONE relational database

• Slow

• Straight to printer

• No possibility for reuse

• Limitations on Formatting

• Limitations with non-European scripts

• IT Departments don't seem to care!

What's better our way?Write Python programs using our library to create PDF documentsdirectly

What's better our way?Write Python programs using our library to create PDF documentsdirectly

• Run on any platform - web servers, embedded application printengines or Client PCs

What's better our way?Write Python programs using our library to create PDF documentsdirectly

• Run on any platform - web servers, embedded application printengines or Client PCs

• Acquire data from anywhere - RDBMS, XML, networkprotocols, Com/CORBA/Java object models

What's better our way?Write Python programs using our library to create PDF documentsdirectly

• Run on any platform - web servers, embedded application printengines or Client PCs

• Acquire data from anywhere - RDBMS, XML, networkprotocols, Com/CORBA/Java object models

• PDF can be viewed everywhere, emailed, downloaded andarchived

What's better our way?Write Python programs using our library to create PDF documentsdirectly

• Run on any platform - web servers, embedded application printengines or Client PCs

• Acquire data from anywhere - RDBMS, XML, networkprotocols, Com/CORBA/Java object models

• PDF can be viewed everywhere, emailed, downloaded andarchived

• No limits to page layout or graphics

What's better our way?Write Python programs using our library to create PDF documentsdirectly

• Run on any platform - web servers, embedded application printengines or Client PCs

• Acquire data from anywhere - RDBMS, XML, networkprotocols, Com/CORBA/Java object models

• PDF can be viewed everywhere, emailed, downloaded andarchived

• No limits to page layout or graphics

What's better our way?Write Python programs using our library to create PDF documentsdirectly

• Run on any platform - web servers, embedded application printengines or Client PCs

• Acquire data from anywhere - RDBMS, XML, networkprotocols, Com/CORBA/Java object models

• PDF can be viewed everywhere, emailed, downloaded andarchived

• No limits to page layout or graphics

• Reusability and scripting

What's better our way?Write Python programs using our library to create PDF documentsdirectly

• Run on any platform - web servers, embedded application printengines or Client PCs

• Acquire data from anywhere - RDBMS, XML, networkprotocols, Com/CORBA/Java object models

• PDF can be viewed everywhere, emailed, downloaded andarchived

• No limits to page layout or graphics

• Reusability and scripting

• Simple, direct and FAST

ReportLab, the company

Founded in January 2000. Progress so far:

• 4 people, New Jersey and London

• 100+ users in user group

• 3 consulting projects

• Systems on trial at major corporations

• Release 1.0 this weekGetting ready to ramp up...

Now show them some output... script demo

Part 2 - Tutorial

Installation and Setup

• Standard package reportlab- goes under path

• zlib very useful

• Python Imaging Library needed for image work

• "accelerators" planned for stream operations

• User Guide and numerous demos included

PDFgen subpackage

Design Goal is very simple - "wrap up" the PDF format in a friendlyAPI

Hello, World

from reportlab.pdfgen.canvas import Canvasfrom reportlab.lib.units import inchfrom reportlab.lib.pagesizes import letter, A4

def run(): c = Canvas("example1_hello.pdf",pagesize=A4) c.setFont("Helvetica-Bold", 72) c.drawString(inch, 9*inch, "Hello World")

# for next page do... # c.showPage() # more drawing stuff...

c.save() # writes to disk...

if __name__=='__main__': run()

Lots to note here. (1) filename, or a stream - only required argument. (2) bunch of arguments to constructor. pagesize is most important. Can be specified as a 2-tuple ofpoints, or use the constants in the pagesizes module. Also mention 'encoding' argument - WinAnsi, MacRoman, custom fonts to come.(3) Basic font mechanism is Postscript fonts. 14 standard ones (enumerate). Bold and Italic as not separate attributes in PS, they are separate fonts. Non standard fonts willhave to give encodingYou set various things in the graphics state, (font, line style etc.) and they persist.showPage not needed for final page, but it does quite a bit of work assembling a block of text with the raw data for each page.Save writes to disk. All our stuff is in memory as it is faster and make forward referencing possible. 10,000 page docs no problem. For bigger runs, we assume you;d breakinto several documents anyway.Note showPage forgets state - PDF cornerstone is that pages are independent of each other, so you can render page 275 without first 274.

Basic Graphics CommandsLine Methods

canvas.line(x1,y1,x2,y2)canvas.lines(linelist)

Shape Methodscanvas.grid(xlist, ylist)canvas.bezier(x1, y1, x2, y2, x3, y3, x4, y4)canvas.arc(x1,y1,x2,y2)canvas.rect(x, y, width, height, stroke=1, fill=0)canvas.ellipse(x, y, width, height, stroke=1, fill=0)canvas.wedge(x1,y1, x2,y2, startAng, extent, stroke=1, fill=0)canvas.circle(x_cen, y_cen, r, stroke=1, fill=0)canvas.roundRect(x, y, width, height, radius, stroke=1, fill=0)

String Methodscanvas.drawString(x, y, text):canvas.drawRightString(x, y, text)canvas.drawCentredString(x, y, text)

Image Methodscanvas.drawInlineImage(self, image, x,y, width=None,height=None)

Shapes are mostly built out of bezier curves and use PATHS (see below). Notestroke/fill options - four possibilities. Images - inline within file, as opposed to in thefile header.

Graphics State 1

Changing Color Statecanvas.setFillColorCMYK(c, m, y, k)canvas.setStrokeColorCMYK(c, m, y, k)canvas.setFillColorRGB(r, g, b)canvas.setStrokeColorRGB(r, g, b)canvas.setFillColor(acolor)canvas.setStrokeColor(acolor)canvas.setFillGray(gray)canvas.setStrokeGray(gray)

Changing Font Statecanvas.setFont(psfontname, size, leading = None)

Changing Line Stylescanvas.setLineWidth(width)canvas.setLineCap(mode)canvas.setLineJoin(mode)canvas.setMiterLimit(limit)canvas.setDash(self, array=[], phase=0)

Graphics State 2

Changing Geometrycanvas.setPageSize(pair)canvas.transform(a,b,c,d,e,f):canvas.translate(dx, dy)canvas.scale(x, y)canvas.rotate(theta)canvas.skew(alpha, beta)

Saving and restoring statecanvas.saveState()canvas.restoreState()

The last two are particularly important as the graphics state iswrite-only.

Say why state tracking not done: Nothing is happening at run time, we are just writing ops to be played back by AcroRead later. There is no API to query. (a) hard, (b)performance - hard to track coords exactly through all operations esp. saveState and restoreState.

Path ObjectsLet you build up complex shapes with tight, efficient PDF.

path = canvas.beginPath()

path.moveTo(x, y)path.lineTo(c, y)path.curveTo(x1,y1,x2,y2,x3,y3)path.arc(x1,y1, x2,y2, startAng=0, extent=90)path.arcTo(x1,y1, x2,y2, startAng=0, extent=90)path.rect(x, y, width, height)path.ellipse(x, y, width, height)path.circle(x_cen, y_cen, r)path.close()

canvas.drawPath(path, stroke=1, fill=0)canvas.clipPath(path, stroke=1, fill=0)

Stress the order: get an object, call methods to build up the region it defines, theneither draw it or stroke it.

Text ObjectstextObj = canvas.beginText(x=0, y=0)

textObj.setTextOrigin(x, y)textObj.setTextTransform(a, b, c, d, e, f)textObj.moveCursor(dx, dy)textObj.getCursor(), textObj.getX(), textObj.getY()textObj.setFont(psfontname, size, leading=None)textObj.setCharSpace(charSpace)textObj.setWordSpace(wordSpace)textObj.setHorizScale(horizScale)textObj.setLeading(leading)textObj.setTextRenderMode(mode) # fill, stroke, path..textObj.setRise(rise)textObj.set[Stroke|Fill]Color[RGB, CMYK](...)

textObj.textOut(text) # tracks x so slowertextObj.textLine(text) # just affects ytextObj.textLines(stuff, trim=1)

canvas.drawText(textobject)

PDF requires text and graphics separated between 'BT' and 'ET'. Separate objectenforces modality with no loss of performance, and encourages you to program theway PDF does.

Image Supportcanvas.drawInlineImage("python.gif",2*inch, 2*inch, 6*inch, 2*inch)

• Uses PIL, so most image formats supported.

• Encoding in pure python takes time (byte level loop), so wepreprocess and cache.

• Simple C extension speeds up by factor of 300.

PDF special featuresPDF offers special features for electronic documents:

• Page Transition Effects

• Forms

• Bookmarks, Links and Destinations

• Outline Trees...plus a whole bunch more we haven't bothered wrapping up yet.

Page Transition EffectsAFAIK PythonPoint is the only app in the world to use these!

Usage canvas.setPageTransition( effectname=None, duration=1, direction=0, dimension='H', motion='I')

Several effects with parameters to customize directions and timingPageTransitionEffects = { 'Split': [direction_arg, motion_arg], 'Blinds': [dimension_arg], 'Box': [motion_arg], 'Wipe' : [direction_arg], 'Dissolve' : [], 'Glitter':[direction_arg] }

Let's play..

Split Split Split

Split Split Split

Split Split Split

Blinds Blinds Blinds

Blinds Blinds Blinds

Blinds Blinds Blinds

Box Box Box

Box Box Box

Box Box Box

Wipe Wipe Wipe

Wipe Wipe Wipe

Wipe Wipe Wipe

Dissolve Dissolve Dissolve

Dissolve Dissolve Dissolve

Dissolve Dissolve Dissolve

Glitter Glitter Glitter

Glitter Glitter Glitter

Glitter Glitter Glitter

Internal HyperlinksPDF documents can contain a wide variety of hyperlinks. They pointto a target rectangle, not a piece of text.

Step one - create your Bookmark:canvas.bookmarkPage(name)canvas.bookmarkHorizontalAbsolute(name, yhorizontal)

Step two - create the link - the 'hot zone' which triggers the jumpcanvas.linkAbsolute(contents, destinationname, Rect=None, addtopage=1, name=None, **kw)

Links and Destinations have unique keys and must match up whendocument is saved. Allows forward referencing. Many innovativeuses...

Savera demo; applications to py2pdf

Outline TreesThese refer to bookmarks, as shown on the last page.

canvas.addOutlineEntry(self, title, key, level=0, closed=None)

#abridged example from our document templateif paragraph.style == 'Heading1': key = 'ch%d' % self.chapterNo self.canv.bookmarkPage(key) self.canv.addOutlineEntry(paragraph.getPlainText(),key, 0, 0)

Eyes left..

FormsText and graphics can be stored once and reused over. Lets youcreate efficient documents; can speed printing.

#first create a form...canvas.beginForm("SpumoniForm")spumoni(canvas) #re-use some drawing functionscanvas.endForm()

#then draw itcanvas.doForm("SpumoniForm")

Not just for business forms; images, logos or this sidebar would begood candidates.Massive speed boost and size reduction for form-based applicationssuch as payslips and contract notes.

Mention ADI example; Fidelity PReS did 50Mb of postscript; ours would be about 2and 20x faster.

PDFgen Summary

• Comprehensive wrapper around PDF file format

PDFgen Summary

• Comprehensive wrapper around PDF file format

• You can do just about anything...

PDFgen Summary

• Comprehensive wrapper around PDF file format

• You can do just about anything...

• ...but you have to do it yourself

PDFgen Summary

• Comprehensive wrapper around PDF file format

• You can do just about anything...

• ...but you have to do it yourself

• Good basis for higher-level libraries tailored to specificapplications.

PLATYPUS subpackage"Page Layout and Tyography Using Scripts"

High-level API dealing with Paragraphs, Frames, PageTemplates andStyles. Goals:

• Make paragraphs possible :-)

• Make flowing documents easy

• Separate formatting from content, thus reducing maintenancecosts

• Create open standard for reusable "report widgets"Much harder than PDFgen as there's more than one way to do it...

Styles and global restyling are key to saving money. open standard plus open sourcewill lead to explosion of nice report content ideas - charts, diagrams, multilingualparagraphs...

PLATYPUS key conceptsFlowable Objects:

Consume space. Can adapt to the enclosing frame, andsometimes split themselves across boundaries. e.g. Paragraph,Table, Figure.

Frames:Rectangular region of a page which holds flowables.

PageTemplates:Specifies frames and static content for a page style. Receivesevents it can use to draw in context.

DocTemplates: "Story Processor" class consumes a list of flowables. Contains1+ PageTemplates. Numerous events to hook into to allowcustomisation.

Use as much or as little as you like

FlowablesImplement three methods: wrap, draw and split.

from reportlab.pdfgen.canvas import Canvasfrom reportlab.platypus import Paragraphfrom reportlab.lib.styles import getSampleStyleSheetfrom reportlab.lib.units import inch

def run(): c = Canvas("example2_flowable.pdf",verbosity=1)

styles = getSampleStyleSheet() normalStyle = styles['Normal'] para = Paragraph("Spam spam spam eggs spam. " * 20, normalStyle)

w, h = para.wrap(400, 400) # print 'size is (%d, %d)' % (w, h) para.drawOn(c, inch, 10* inch)

#parts = para.split(width, height) c.save()

They occupy a rectangle. Internally, they call 'draw' and transform for you. Happy itreduced to something so simple; most advanced things become a matter of writing anew paragraph or drawing class. e.g. JParagraph, equations, and thus do not delay thewhole framework.

FramesA frame is a rectangular region on the page, holds flowables.

# imports and style setup skippedstory = []story.append(Paragraph("This is a Heading",styleH))story.append(Paragraph("This is a paragraph in <i>Normal</i> style.",styleN))c = Canvas('mydoc.pdf')f = Frame(inch, inch, 6*inch, 9*inch, showBoundary=1)f.addFromList(story,c)c.save()

We're in a frame now. I told it to draw its boundary.

Only things to specify are padding and position. AddFromList and drawBoundary arekey methods.You might draw one directly for a form with some legal or marketing text on it.

DocTemplate

PageTemplate

two column

PageTemplate

chapter page

PageTemplate

title page

left

Fra

me

right

Fra

me

flowable 157

flowable 156

flowable 155 First Flowable

Chapter 6: LubricantsCollege Life

PageTemplates

DocTemplatesStory Processor class. Contains a number of page templates (1+).Create a document with

doc.build(story)

Default behaviour: keep using first given page template until story isconsumed.Numerous events are raised as it processes the story; you writesubclasses which hook these as needed.

ParagraphsXML tags can define intra-paragraph markup:

<font size="16"><font color="red">You are hereby charged</font> that on the 28th day of May, 1970,you did willfully, unlawfully, and <font size="20">with malice offorethought</font>, publish an alleged English-Hungarian phrasebook with intent to cause a breach of the peace. How do you plead?</font>

...produces...

You are hereby charged that on the 28th day of May, 1970, you didwillfully, unlawfully, and with malice of forethought, publish an allegedEnglish-Hungarian phrase book with intent to cause a breach of thepeace. How do you plead?

Tags are available for font, bold, italic, greek letters, and asophisticated numbering widget.

Tables

Division Jan Feb Mar Q1 TotalNorth 100 115 120 335South 215 145 180 540East 75 90 135 300West 100 120 115 335

PINGO drawings

Part 3 - Cool Stuff

Open Source Business is a major part of this conference, so I'd like to take just fiveminutes and discuss where we are heading.

Part 4 - The Business Plan

Open Source Business is a major part of this conference, so I'd like to take just fiveminutes and discuss where we are heading.

Where we are as a company

tiny, poor, sob, sob, sob...but

• On trial in big corporations like Cable and Wireless, FidelityInvestments

• 100+ user group members

• real systems doing 5000 page runs (ServiceMagic)

• real development contracts underway

Come a long way quite quickly.

Loss Leader Strategy

• Free library for Python programmers

• Expensive server-side products and services for corporates

Proprietary offerings should involve a thin layer of code whichtargets the library to specific business needs or vertical markets.

People don't blink at $10,000 for what we have now. So why shouldn't we take it? theprice of not programming in Python!

Planned Products

General Principle: If you are prepared to write Python code, you canhave it for free. If you would prefer to spend money, we can helpwith that too...

• RML2PDF for corporates who don't want Python

• Enterprise Report Server

• Data Adaptors - e.g. import from accounts systems

• Industry-specific report widgets - e.g. CAD formats, oil data

Planned Services

Aim for services which scale better than consulting.

• Application Service Provider - the big one

• Form creation - "packaged" report development with a big costadvantage

• General consulting and joint development

Next Steps

• Raise Funds

• Test market proprietary products and services

• Build a well-rounded team

• Find the right strategic partners