Elements of Programming II
description
Transcript of Elements of Programming II
Part II
Elements of Programming
29
Chapter 4
Coding & Computation
4.0.14
Coding & Computation1
orWhat's all this about 1s and 0s?
4.0.15
Topics
Aim: Communicate certain fundamental concepts about computers andcomputing. Relax.
² Digital encoding
² (Digital) computation
² Fundamental programming
² Applied programming
1File: coding-computation-slides.tex.
31
32 CHAPTER 4. CODING & COMPUTATION
4.0.16
Digital encoding
² Digital ¼ discrete
(Analog ¼ continuous)
² Encode = to put into or represent with a code
² Code ¼ \a system of symbols"
. . . used for communication (Morse code, braille, \one if by land,. . . "), forinstructing computers (machine code), &c.
Code are discrete: elements are in or out. Period.
4.0.17
Digital encoding: Examples of codes
² Written words: \Bob," \Carol," \Ted," \Alice"
Words \stand for" things, events, relations, etc. Make up sentences, etc.
² Letters of the alphabet
What an idea! With a few letters, all the words; with a few thousandwords, an in¯nite number of sentences.
4.0.18
Digital encoding: Examples of codes
² Morse code: Dots: Dashes:
A J SB K TC L UD M VE N WF O XG P YH Q ZI R
(There's more for other symbols, e.g. ÄA is )
33
4.0.19
Digital encoding: Comments on Morse code
² How many elementary / primitive symbols? Answer: 2. Why?
² How many symbols do we need to cover the letters of the 26 letters of thealphabet?
Answer: 21 + 22 + : : :+ 2n until > 26, i.e., n = 4.
² Note: Why not just 25?
² Why do the di®erent letters have the symbol patterns they have?
4.0.20
Digital encoding: Examples of codes
² Braille: array of 3£2 = 6 dots, raised or not.
² Character codes in computing
1. ASCII (American Standard Code for Information Interchange)
Standard in computing. See the IDT book.
2. EBCDIC: IBM mainframes
3. Unicode: For all the world.
4.0.21
ASCII
² Binary code: 1s and 0s.
7 or 8 bits (= binary digits)
² P is decimal 80 = 8£ 101 + 0£ 100
² or in binary = 0£27+1£26+0£25+1£24+0£23+0£22+0£21+0£20
² or 01010000 in 8-bit binary. 01010000 in 7-bit binary, with parity even.11010000 in 7-bit binary, with parity odd.
34 CHAPTER 4. CODING & COMPUTATION
4.0.22
Number coding systems
² Decimal (10-based) has 10 symbols possible per `slot': 0, 1, 2,. . . , 9
² Binary (2-based) has 2 symbols possible per `slot': 0 and 1
² Octal (8-based) has 8 symbols: 0, 1, 2, 3, 4, 5, 6, 7, and 8
² Hexidecimal (16-based) has 16 symbols: 0, 1, . . . , 9, A, B, C, D, E, and F
Number coding can be in any base: 2, 3, . . . , n. Why base 2 for computers?Why base 8, base 16?
4.0.23
Digital encoding.
² We encode from a list of atomic symbols (e.g., the alphabet) and com-pose more complex things by combining these symbols (e.g., words arecomposed of letters of the alphabet, sentences are composed of words).
² At the most abstract, general level, we can use numbers to be our atomicsymbols (numerals, actually).
² So, e.g., P = 80 decimal = 01010000 binary in ASCII.
² Let dash ( ) = 0, dot ( ) = 1. Then in Morse code, P = 1001 binary
4.0.24
Digital encoding: Comments
² Conventionality (arbitrariness)
Why not P = 1010111? etc.
² Generality
Can one type of encoding encode everything that another type of encodingcan encode? Does it matter whether we do decimal or binary?
Why? (Prove it!)
² Wait: 01010000 is a (binary) number, yet it's an encoding of P. Which isit? How can you tell? Why isn't it ÄAH in Morse code? Or a sound or apicture or a movie?
35
4.0.25
Digital computation
² . . . or a program instruction?
² Roughly, a computation is a manipulation and/or recognition of a digitalencoding
² A computer is a machine that does computations, that manipulates, recog-nizes, and acts on digital encodings
² Our computers work on binary (1s and 0s) encodings. Why?
4.0.26
Digital computation
² Is that all? Can't some computers do more than others?
² Yes, that's possible. Size, speed, of course.
² Actually, just a few instructions are su±cient to compute all possiblemanipulations on binary digital encodings, and these are in turn fullygeneral.
² An amazing fact.
4.0.27
Digital computation
² So, all real computers are fundamentally equivalent, just some are biggerand faster than others.
² What about interacting with the world? `I/O' as we say.
Same thing, just hooked to I/O devices.
² Clari¯cation: manipulations aren't just arithmetic; they're anything (onbinary encodings).
36 CHAPTER 4. CODING & COMPUTATION
4.0.28
Fundamental programming
² Computers run by executing program instructions, one after another.
² The program instructions instruct the computer to manipulate, recognize,and act upon digital (binary) encodings.
² How are program instructions represented to the computer? As binaryencodings. What about the data used by the program? Same thing. Howdoes it know?
² Basic cycle: (1) fetch the next instruction (from memory into the CPU),(2) execute the instruction, (3) ¯gure out where the next instruction isand go to (1).
4.0.29
Applied programming
² Don't like them 1s and 0s (machine language)
² So, `higher-level' languages: metaphor
² Compiler: takes your `higher-level' jottings and translates them into ma-chine language, so your program can be executed.
² Note: machine language programs are speci¯c to the machine type youare running: Intel X, Intel Y, Macintosh, Sun, Alpha, IBM, etc.
4.0.30
Applied programming
² Interpreter: Accepts a compact `semi-compiled' (byte-code) version ofyour jottings and executes it by translating it on the °y to machine codeand sending it o® for execution. Visual Basic, Excel.
² Possibility: Interpreters for each type of mahcine (hardware), but all canthen execute the same byte-code. \Write once, run everywhere."
² Think of the Internet.
² And: Java.
4.1. BIBLIOGRAPHIC NOTE 37
4.0.31
Applied programming: On the Internet, etc.
Where does/can code execute?
² On your PC (your client, whether Wintel, Mac, Linux, Unix,. . . ): Yourbrowser (Internet Explorer, Netscape, Hot Metal, . . . )
² On your PC: Spreadsheet programs, word processing programs, . . .
4.0.32
Where does code execute? (con't.)
² On the server: The Web server program that responds to requests fromyour browser and serves up ¯les to you.
² On the server: Business programs that run in response to your inputs fromyour browser: shopping carts, billing, etc. (Think of buying something onthe Web, or participating in an auction.)
² On the server: Business programs that create HTML pages and send themto you on the °y. PHP, ASP, JSP.
4.0.33
Where does code execute? (con't.)
² On your PC, via your browser: JavaScript, VBScript for graphics andanimation (but primitively)
² On your PC, via your browser: ActiveX (Microsoft) and Java applets
Downloaded in real time from the server! Why? Pluses? Minuses? Wor-ries?
/* $Header$ */
4.1 Bibliographic Note
A delightful introduction to many of the topics in this chapter is Code: TheHidden Language of Computer Hardware and Software, by Charles Petzold,Microsoft Press: Redmond, Washington, 2000.
/* $Header$ */
38 CHAPTER 4. CODING & COMPUTATION
Chapter 5
Why Program?
People program computers for all sorts of reasons and purposes. Without undueabuse of reality, we can classify programming activities by degrees of di±culty,or required know-how. In increasing levels of technical challenge we have:
1. End-user programming
2. Utility-and-analysis (U&A) programming.
3. Applications programming
4. Systems programming
An end-user is anyone who interacts with a computer program|such as aspreadsheet, presentation software, or a word processor|in order to accomplisha task. Putting together a nontrivial spreadsheet and using it to solve prob-lems counts as, and is perhaps the prototypical case of, end-user programming.Typically, end-user programming is accomplished through visual or graphicalinterfaces. The end-user manipulates these interfaces in order to send instruc-tions to the program. Think|again, prototypically|of selecting a range inExcel and clicking on a menu in order set the color displayed in the range. Sim-ilarly, formatting in word processors (e.g., Microsoft Word, among others) andpresentation software (e.g., Microsoft PowerPoint, among others) is a form ofend-user programming that is usually done by manipulating a graphical userinterface. End-users program because no one else will do it for them. The hopeis that an end-user can use the software as tool to solve the problem at hand,and can do this faster, cheaper, and more e®ectively than a specialist program-mer. Given the requisite domain knowledge and an appropriate softwtare tool,the end-user can proceed expeditiously to solve the problem at hand, withouthaving to take the time to explain the substance of the problem to a techni-cian. Such hopes are in fact often reasonable and indeed ful¯lled. End-userprogramming is a main topic of this book.
At the other end of the technical challenge scale, systems programming in-cludes writing operating systems, pieces of operating systems (such as device
39
40 CHAPTER 5. WHY PROGRAM?
drivers), compilers, database systems, communications software. It also includeshigh-end applications programming tasks such as real-time systems and parallelcomputing software. Programming at this level requires professional-level skilland dedication. C, C++, and assembly language are representative program-ming languages for systems programmers. An education in computer science isall but necessary as a background to this profession, although there are manywho have learned on the job, by apprenticeship.
Applications programming encompasses the great bulk of computer pro-gramming in businesses and organizations generally. Transaction processingsystems, systems that handle the conduct of commerce, such as sales and ac-counting systems, are prototypical examples. These systems are commonly writ-ten in Cobol, C/C++, and Java. Because of their \mission critical" nature theyare usually built with a great deal of care. Perhaps the most important aspect ofthis is attention to system requirements, which often are largely peculiar to thehost organization. Thus, systems analysis and design is critical for success ofthese systems. Ideally, when analysis and design are done well, programming|or coding as it is called|becomes a fairly straightforward task. Many businessschool and engineering school graduates enter the job market in an applicationsprogramming context. Often, these graduates will have studied ManagementInformation Systems. Initially trained to do coding (e.g., by the consulting ¯rmthat hires them), most of these people will quickly move on to systems analysisand to ascertaining business requirements, and then into general management,where they will continue to confront the problems of obtaining and maintainingmission critical information systems for their organizations.
We are centrally concerned in this book with end-user and U&A program-ming, and only peripherally concerned with systems programming and applica-tions programming. Since end-user programming is a familiar concept, we willdwell more at length on U&A programming.
Utility-and-analysis (U&A) programming sits between end-user program-ming and applications programming. Like applications programming, U&Aprogramming usually involves programming with a language, rather than bymanipulating a graphical user interface. Scripting languages such as VisualBasic, Perl, Python, and HyperTalk are the prototypical U&A programminglanguages. The motivations for U&A programming are much the same as thosefor end-user programming. A task is at hand, an analyst or other professionalwho is not primarily a programmer is charged with completing the task, andthe end-user tools available are not su±cient. Often it makes excellent sense toinvest some time and e®ort to write \one-o®" or \glue" programs that solve theproblem at hand and that aspire to not much else. Here are some examples;there are others.
² Programming end-user tools.
Programming visually, as end-users do, has the advantage of being veryeasy and the disadvantage of being rather limited in what it can do. Forthis reason, scripting languages such as Visual Basic for Applications (fromMicrosoft) have been created for manipulating end-user tools under pro-
41
gram control. In Excel, Word, PowerPoint and other end-user tools suchprograms are called macros, and they can be very valuable indeed. Writinga macro is a prototypical case of U&A programming. Prototypical tasksfor macros include automated loading and formatting of data from an ex-ternal ¯le, and automation of large or complex tasks involving repetitionsof certain subtasks.
² Ad hoc modeling and analysis.
Formulas in spreadsheets can easily be used to build useful models formany business purposes, especially for ¯nancial analysis. That is in factwhat spreadsheets were originally designed for. Once these models reacha certain, surprisingly low, level of complexity, they become di±cult tovalidate and maintain. This invites falacious decision making. Experiencehas shown the invitation is frequently accepted. Building the more com-plex models in separate scripting languages facilitates validation, mainte-nance, and reuse outside any particular spreadsheet. As a model growsin sophistication and importance it may be handed o® to an applicationsprogramming context. Having the model written in a scripting languagefacilitates this, too.
² Internet-related tasks.
Extracting information from emails, from ftp sites, or from Web pagesis often valuable, if not necessary. Modern scripting languages facilitatedoing this under program control and on large scale. A better alternativethan doing it manually.
² Data cleaning and formatting.
A very common problem for business analysts is to clean up and properlyformat a given set of data, as a prelude to analyzing it and presenting theresults to customers. For example, data may come from several di®erentdatabases and need to be reformatted and mapped into Excel. The datamay also involve thousands of records, precluding doing this manually.Scripting languages are ¯rst-rate tools for this sort of thing.
² Text formatting and information extraction.
Text, which is the source of so much information, is an even greater chal-lenge than data when it comes to cleaning and formatting for subsequentanalysis. Again, modern scripting languages are excellent tools for thispurpose.
² Rapid assembly of applications.
It is often possible and desirable to build special-purpose business applica-tions by assembling them from existing software. Microsoft O±ce is oftenused this way. A decision support system (DSS) for a special purposeor analysis project is assembled|\glued together"|from Excel, Access,Word, and even PowerPoint. The user-analyst sees an Excel interface from
42 CHAPTER 5. WHY PROGRAM?
which models are run, data are extracted and saved to an Access data-base, reports are generated in Word and PowerPoint|all largely underprogram control. The user enters information, makes choices, and clickson buttons; the software system does the rest.
As should by now be apparent, good U&A programmers ¯nd themselvessigni¯cantly empowered in many business contexts and seldom lack for oppor-tunities to employ their skills. Our goal here is to get you started on the elementsof U&A programming. We will see that with only a little e®ort valuable skillsand the resulting empowerment are quite achievable.
This brings us to choice of a scripting language. There are many languagesappropriate for our purposes. The top of any shortlist would include VisualBasic (for Applications), Perl, and Python. Since we can't study them all wehave to pick one for our main focus, and we have chosen Python. Visual Basicwould probably be our second choice; we shall have occasion to discuss it inwhat follows. Here are the main considerations behind this choice.
The strengths of Visual Basic for Applications (VBA) are substantial. Mi-crosoft is committed to it, continues to develop it, and has built it into its O±ceproducts. Thus if you have Excel you have VBA too. Visual Basic, a stand-alone superset of VBA, is one of the most popular programming languages inthe world, perhaps the most widely known. Employers of business analysts veryoften expect, in the sense of anticipate, that U&A programming will be donein VB or VBA. There are many (smallish) applications programming projectsthat are done in VB. VB/VBA is very good at building graphical user inter-faces and for programming other Microsoft applications. VBA comes with agood development and debugging environment.
VBA also has limitations. Here are some of them. It only works on Windowsmachines. It now lacks a rich library of general programming|and U&A!|tools, e.g., for text handling, for mathematical computations, for Internet pro-gramming. VB, unlike VBA, costs money after you've already bought O±ce,and regular upgrades (costing more money) are more or less mandatory.
Python's strengths include these. Python is open source software backed bya large and committed community with a strong track record of maintaining andimproving it. Python has achieved excellent acceptance as a scripting languageand is in fact widely used (although not nearly so as VB or even Perl). Pythonis free (at http://www.python.org, among other places). This is especially im-portant if you have multiple machines on which you want to run Python andyou balk at paying for multiple licenses for VB. Python is available for Windowsmachines, Macintoshes, and Linux machines (among others). The code will run(pretty much) identically on any installation.
Python is a nice, well-designed language. It is simple and easy to learn, andhas been taught successfully to ¯rst-time programming high-school students.Python can be run in interactive mode, thereby facilitating exploration anddebugging. It can also be run in script mode, as is needed for real applications.
Python works well as a stand-alone language, usable in essentially all moderncomputing environments. It comes with a number of well-considered high-level
5.1. INFORMATION SOURCES 43
features that make certain U&A programming tasks very easy. Fundamentally ascripting language, Python programs can have GUIs (graphical user interfaces),although we will not discuss them. Python is designed to be extensible and iscontinually the bene¯ciary of extensions built by a large community of U&Aprogrammers. Python's extensions for Internet programming, for mathematicaland numerical computation, for string (or text) manipulation, and for manipu-lating Microsoft programs (such as Excel, Access, Word, and PowerPoint) areespecially useful for U&A programming problems.
Finally, the skills one acquires in learning any good programming languagetransfer readily when learning a new language. Learning Python and a littleVBA, as we shall be doing, positions you well for learning VB and VBA shouldthat become useful. On the other hand, there are important things to learnthat do not ¯t well (or at all) with VBA and that are easily done in Python.
5.1 Information Sources
What follows in this book about Python is meant to ¯ll in the gaps and slightlyextend introductory material on Python. That material is readily available forfree on the Web. See the Python home page for everything to do with Python,including documentation and tutorials:
http://www.python.org/At
http://www.python.org/doc/Newbies.htmlyou will ¯nd a number of Python tutorials. I recommend reading ¯rst \A Non-Programmer's Tutorial for Python," by Josh Cogliati, at
http://www.honors.montana.edu/~jjc/easytut/easytut/After that, you might try \How to Think Like a Computer Scientist," thePython version of Allen Downey's open source book, with Je® Elkner. It'sat
http://www.ibiblio.org/obp/thinkCSpy/Then there's the \Python Tutorial," by Guido van Rossum, with Fred L. Drake,Jr., editor. Guido created Python. This tutorial is excellent, but it's aimed atpeople who are experienced C programmers. If you aren't one, many of theremarks meant to clarify things will be utterly mysterious. Still, it's a goodsource. Guido's tutorial is at
http://www.python.org/doc/current/tut/tut.htmlIn addition, Python's online documentation is excellent. The \Global ModuleIndex (for quick access to all documentation)" and the \Library Reference (keepthis under your pillow)" come with the Python installation and are available athttp://www.python.org/doc/.
Several good books on Python are in print, but I think the free tutorialslisted above, along with these notes, and the online documentation should beplenty. If you are after all a book person, here's a short list.
1. Learning Python by Lutz and Ascher [14] is a good introduction to Pythonprogramming, although not to programming in general.
44 CHAPTER 5. WHY PROGRAM?
2. Python Essential Reference by Beazley [2] is a terri¯c|even essential|reference for anyone starting out new to Python with a nontrivial pro-gramming task at hand.
File: why-program.tex
Chapter 6
Visual Basic forApplications: A BriefTutorial
6.0 Notes
6.0.1
Visual Basic for Applications (VBA)1
Goals:
² Familiarity & experience with solving problems algorithmically.
² Familiarity with Visual Basic for Applications (VBA). VBA/Excel dialect.
² Empower you to build useful applications and to learn more on your own.
² Get you comfortable in a programming environment.
Note: But you have to do a lot of work on your own! Lectures are to provideyou a map. You must make the trip yourself. (cf., VBATutor.xls)
1File: misnotes-vba-slides.tex.
45
46CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.2
Goals for lecture 1.
1. Introduce the basic concepts of macros in Excel (which are written inVBA).
2. Show how to record and run a macro, and examine and edit its code inthe Visual Basic Editor.
3. Use the Visual Basic Editor to create a simple VBA program (Sub) andcall it for execution from a button on a worksheet.
4. Use the Visual Basic Editor to create a simple VBA program (Function)and call it for execution from a cell in a worksheet.
5. Introduce the core structure of VBA programs.
(cf., Worksheets("Lecture1") code module Lecture1)
6.0.3
Macros
² Programs in VBA. Role of VB and VBA for Microsoft.
² What is VBAnExcel good for?
{ Assembling, \gluing together," applications in MS O±ce. (Larger issue:\component-based applications.")
{ Utility (small job) programming, e.g., for data preparation and manipula-tion, for programming the interface, . . .
{ For learning how to program.
{ For prototype programming.
{ For learning about modern software concepts (e.g., OOP) and developmentenvironments (now a good one in Excel for VBA).
6.0. NOTES 47
6.0.4
More on macros
² Recording macros
{ Tools ) Macro ) Record New Macro. . .
{ Stop, relative addressing
{ Running the macro: Tools ) Macro ) Macros. . .
{ Viewing the macro: Tools)Macro) Visual Basic Editor Alt+F11
6.0.5
Recorded macro, called Bob
Sub Bob()
'
' Bob Macro
' Macro recorded 2/13/98 by Steven O. Kimbrough
'
'
Range("B3:C4").Select
Selection.Copy
Range("A1").Select
ActiveSheet.Paste
Application.CutCopyMode = False
Range("A1").Select
End Sub
6.0.6
Basics of Visual Basic for Applications
² VB, VBA, VBAnExcel, VBAnWord,. . .
² Macro modules
² Subs
² Functions
² Variables & declaring them
² Structure of a VBAnExcel application
48CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.7
A simple Sub
1. Tools ) Macro ) Visual Basic Editor
2. Insert ) Module (Not: Class Module)
3. View ) Properties Window
4. Then write some code:
Sub HelloWorld()
MsgBox "Hello world!"
End Sub
² Use the VB Help menu item to search on MsgBox. (And use it generallyand often!)
² Subs do things, but do not return values. Functions do things, and doreturn values.
6.0.8
Now, make it run from Excel. . . . . .
1. View ) Toolbars ) Control Toolbox
2. Select and draw a button.
3. Right-click with the button selected ) Properties. Set the properties asdesired, then close the Property Window.
4. Right-click the button selected ) View Code
5. Add a call to HelloWord:
Private Sub cmdHelloWold_Click()
HelloWorld
End Sub
6. Return to Excel, exit design mode, and click the button.
6.0. NOTES 49
6.0.9
A simple Function
1. Tools ) Macro ) Visual Basic Editor
2. Select a code module, e.g., the one with Sub HelloWord.
3. Add code and save work:
Function dblUtility(X, Hi, Lo, Risk) As Double
dblUtility = ((X - Lo) / (Hi - Lo)) ^ Risk
End Function
4. Return to Excel and use this function in a cell, just as any built-in Excelfunction.
6.0.10
Try these procedures in a code module:
Function dblCubed(X As Double) As Double
dblCubed = X ^ 3
End Function
Sub CubeMe()
Dim dblDaNumber As Double
dblDaNumber = _
InputBox("What number " & _
"would you like to cube?")
dblDaNumber = dblCubed(dblDaNumber)
MsgBox "And the answer is " & dblDaNumber
End Sub
² Dim dblDaNumber As Double declares (Dimensions) the program variabledblDaNumber as type Double (precision °oating point).
² for line continuations. & for string concatenation.
50CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.11
VBAnExcel Programs.
Are collections of Subs and Functions (andDeclarations). Typically they:
1. Are called (started) from Excel.
2. Read in information. From, e.g., worksheets, dialog boxes, ¯les, databases.
3. Store this information in variables.
4. Computationally manipulate the variables.
5. Write out the information. To., e.g., worksheets, dialog boxes, ¯les, data-bases.
Basic concepts at hand, details now follow.
6.0.12
Goals for lecture 2.
1. Introduce program variables and how to declare them.
2. Discuss and show how|in VBAnExcel|to read and write informationfrom and to worksheets.
3. Introduce the Object Browser.
(cf., Worksheets("Lecture2") code module Lecture2)
6.0.13
Recall: VBAnExcel Programs.
Are collections of Subs and Functions (andDeclarations). Typically they:
1. Are called (started) from Excel.
2. Read in information. From, e.g., worksheets, dialog boxes, ¯les, databases.
3. Store this information in variables.
4. Computationally manipulate the variables.
5. Write out the information. To., e.g., worksheets, dialog boxes, ¯les, data-bases.
6.0. NOTES 51
6.0.14
Variables
Expressions (think of names) in programs that can hold di®erent values atdi®erent times.
² X, Hi, Lo, Risk in the dblUtility function.
Option Explicit
Sub FirstVariableExample()
Dim I As Integer
Dim MyFirstVariable As Integer
'Note: Dim I, MyFirstVariable as Integer
' leaves I an Integer and MyFirstVariable
' as a Variant. Thanks, Bill!
MyFirstVariable = 3
For I = 1 To MyFirstVariable
MsgBox "Showing and counting: " & I
Next I
End Sub
6.0.15
Variables have data types
² Types:
{ Integer, Long
{ Single, Double
{ Currency
{ Date
{ String
{ Variant
² Why?
52CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.16
Programs (in general, I emphasize):
² Declare variables
² Put values into variables
² Make calculations with variables
² Store results of calculations in variables
6.0.17
Declaring variables (in VBA)
² Variables should always be declared.
{ Why?
{ Variant if not declared.
{ Use Option Explicit in the declarations section to force declaration ofvariables.
² Variables have scope. Why?
² Declaring variables
{ Dim in a procedure. Scope: that procedure.
{ Private (or Dim) in the declarations section of a module. Scope:that module.
{ Public in the declarations secion of a module. Scope: entire appli-cation.
6.0. NOTES 53
6.0.18
Value(s) of MySecondVariable?
Option Explicit
Private MySecondVariable As Integer
Sub PublicExample1()
MsgBox "We're in PublicExample1 " & _
"and MySecondVariable = " & _
MySecondVariable
End Sub
Sub PublicExample2()
MySecondVariable = 23
MsgBox "We're in PublicExample2 " & _
"and MySecondVariable = " & _
MySecondVariable
End Sub
6.0.19
Reading from, and writing to, a worksheet
Sub CosineHardWired()
Dim MyNumber As Double
MyNumber = _
Worksheets("Lecture2").Cells(9, 2).Value
'Note: Cells(9,2) = row 9, column 2 of the
'worksheet.
Worksheets("Lecture2").Cells(11, 2).Value = _
Cos(MyNumber)
'This also works:
'Worksheets("Lecture2").Range("B11").Value = _
'Cos(MyNumber)
'And suppose MyTestRange is defined B2:D13.
'Then this works, too:
'With Worksheets("Lecture2").Range("MyTestRange")
' .Cells(10, 1).Value = Cos(MyNumber)
'End With
End Sub
Try a nonnumber in B9. Debug. Reset button. Later: the debugger.
54CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.20
Generalizing on reading & writing
² Reading & writing the Excel worksheet are special cases of getting andsettingobject properties.
² So far, the Value property of a particular object, a particular cell.
² Why not, say, the color of a cell?
Sub SimpleShowColor()
Dim MyTempHolder As Variant
MyTempHolder = _
Worksheets(2).Cells(14, 2).Interior.ColorIndex
MsgBox "The interior color of B14 is " _
& MyTempHolder & "."
End Sub
LOTS of objects and properties in Excel.
6.0.21
Generalizing on reading & writing (con't.)
Sub GetTheWorksheetName()
Dim Temp As String
Temp = Range("CellBob").Worksheet.Name
MsgBox "The name of the worksheet in which " & _
"the range CellBob resides is " & Temp
End Sub
Sub RenameMeTheWorksheet()
Dim Temp As String
Temp = _
InputBox("New name for this worksheet?")
ActiveSheet.Name = Temp
End Sub
6.0. NOTES 55
6.0.22
The object browser, F2 in the VBA Editor
² Displays, and lets you explore, all available objects, methods, and prop-erties. Nifty!
member = (method _ property)
² Right-click on a member. Your code. Excel's code.
² Example: Look in Excel, Range class, Cells member (a property). Callfor help.
6.0.23
The Object Browser (con't.)
A word to the wise:
Be ever vigilant. No program documentation is ever complete|orcompletely accurate|and the VBA on-line Help is no exception.Some of the descriptions are just plain wrong. Some of the codesamples don't work. Any many, many \gotchas" are left unexplored.Still, if you take the documentation with a small grain of salt, you'll¯nd an enourmous amount of important information there. And theeasiest way to get to the information is via the Object Browser.
|from Excel 97 Annoyances, p. 205.
6.0.24
Goals for lecture 3.
1. Introduce control structures.
2. Introduce and discuss arrays (and their uses).
3. Discuss calling procedures from within procedures.
(cf., Worksheets("Lecture3") code module Lecture3)
56CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.25
Control structures: For...Next
² For doing something a known number of times.
² You've seen this already.
(e.g., Sub FirstVariableExample())
For <counter>=<start> To <end> [Step <incr>]
[statements]
Next [<counter>]
<counter> indicates a required counter expression (number or variable).[Step <increment>] indicates that optionally you have have the symbol
Step followed by (and now mandatory) an increment expression (number orvariable).
And so on, generally.
6.0.26
Control Structures: If...Then
² For doing something (THEN) on condition that something else (IF) is true(else, skip and continue).
If <condition> Then
[statements]
End If
6.0.27
Control Structures: If...Then...Else
² For doing something (THEN) on condition that something else (IF) istrue; otherwise doing the ELSE clause.
If <condition> Then
[if-statements]
Else
[else-statements]
End If
Always: Either the if-statements are executed (when <condition> is true),or the else-statements are executed (when <condition> is false).
6.0. NOTES 57
6.0.28
If...Then...Else example
Code for a simple comparison of two (cell) values.
Sub SimpleCompare(Left, Right)
If (Left <> Right) Then
MsgBox "The two values are different."
If (Left < Right) Then
MsgBox "The Right value exceeds the Left."
Else
MsgBox "The Left value exceeds the Right."
End If
Else
MsgBox "The two values are the same."
End If
End Sub
6.0.29
If...Then...Else example (con't.)
The button code to call this Sub.
Private Sub cmdSimpleCompare_Click()
Dim Left
Dim Right
'Both Left and Right will be Variants
Left = _
Cells(4, 2).Value
Right = _
Cells(4, 3).Value
Call SimpleCompare(Left, Right)
'This will also work:
'SimpleCompare Left, Right
End Sub
58CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.30
An alternative, using If...Then...ElseIf ...
Sub SimpleCompareElseIf(Left, Right)
If (Left = Right) Then
MsgBox "The two values are the same."
ElseIf (Left < Right) Then
MsgBox "The two values are different."
MsgBox "The Right value exceeds the Left."
ElseIf (Left > Right) Then
'The following also works:
'ElseIf True Then
MsgBox "The two values are different."
MsgBox "The Left value exceeds the Right."
End If
End Sub
Which is better code? Why? What about ElseIf (Left > Right) Then ver-sus ElseIf True Then?
6.0.31
Control structures: Select Case
A way of generalizing If...Then...Else...
Select Case <test expression>
Case <1st expression list>
[1st statements]
Case <2nd expression list>
[2nd statements]
:
:
Case Else
[else statements]
End Select
6.0. NOTES 59
6.0.32
Example of Select Case
Function Bonus(performance, salary)
'This function is from the
'Microsoft VB Help files, on Select Case.
Select Case performance
Case 1
Bonus = salary * 0.1
Case 2, 3
Bonus = salary * 0.09
Case 4 To 6
Bonus = salary * 0.07
Case Is > 8
Bonus = 100
Case Else
Bonus = 0
End Select
End Function
6.0.33
Somewhat better
Function dblBonus(performance As Integer, _
salary As Double) As Double
If (performance < 1 Or performance > 10) Then
dblBonus = -9999
Exit Function
End If
Select Case performance
Case 1
dblBonus = salary * 0.1
Case 2, 3
dblBonus = salary * 0.09
Case 4 To 6
dblBonus = salary * 0.07
Case Is > 8
dblBonus = 100
Case Else
dblBonus = 0
End Select
End Function
60CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.34
Control structures: Do...Loop
Alternative to For...Next
² Di®erences?
² Two versions, two cases each:
Do {While | Until} <condition>
[statements]
Loop
Do
[statements]
Loop {While | Until} <condition>
6.0.35
Arrays
² Fundamental, but basic, data structures. Very widely used in program-ming.
² Similar to vectors and matrices in mathematics.
{ But can have more than 2 dimensions
² Can have 1, 2, 3, . . . dimensions
² Named much as are variables
² Great for capturing a range on a worksheet.
6.0. NOTES 61
6.0.36
Reversing the contents of a column range
See VBATutor.xls, "Lecture3 Arrays" worksheet, Lecture3Arrays code mod-ule. Line numbers below added.
[1] Sub ReverseRange(FromRange As Range, _
ToRange As Range)
[2] Dim intFromLength As Integer
[3] Dim intToLength As Integer
[4] Dim DaReverseArray() As Variant
[5] Dim I As Integer
[6] intFromLength = FromRange.Rows.Count
[7] intToLength = ToRange.Rows.Count
[8] If intFromLength <> intToLength Then
[9] MsgBox intFromLength & " To length: " _
[10] & intToLength
[11] MsgBox "Sorry, the two ranges are " & _
[12] "not comformable. Exiting."
[13] Exit Sub
[14] End If
6.0.37
Reversing the contents of a column range (con't.)
[15] ReDim DaReverseArray(1 To intFromLength)
[16] For I = 1 To intFromLength
[17] DaReverseArray(I) = _
[18] FromRange.Cells(I).Value
[19] Next I
[20] For I = 1 To intFromLength
[21] ToRange.Cells(I).Value = _
[22] DaReverseArray(intFromLength + 1 - I)
[23] Next I
[24] End Sub
62CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.38
The button code for calling ReverseRange and for clearing out the range reversed.
Private Sub cmdCallReverse_Click()
Dim FromRange As Range
Dim ToRange As Range
Set FromRange = Range("reverse")
Set ToRange = Range("reversed")
Call ReverseRange(FromRange, ToRange)
End Sub
Private Sub cmdClearReversed_Click()
Range("reversed").ClearContents
End Sub
6.0.39
Command button code for squaring the reverse
Private Sub cmdSquareReversed_Click()
Dim FromRange As Range
Dim ToRange As Range
Dim VectorLength As Integer
Dim FromVector() As Double
Dim ToVector() As Double
Set FromRange = Range("reversed")
Set ToRange = Range("reversedsquared")
VectorLength = FromRange.Rows.Count
ReDim FromVector(1 To VectorLength)
ReDim ToVector(1 To VectorLength)
Dim I As Integer
'Check for numeric input
For I = 1 To VectorLength
If Not IsNumeric(FromRange.Cells(I).Value) Then
MsgBox "Inputs must be numbers. Exiting."
Exit Sub
End If
Next I
6.0. NOTES 63
6.0.40
(con't.)
For I = 1 To VectorLength
FromVector(I) = FromRange.Cells(I).Value
Next I
Dim MyErrorCode As String
MyErrorCode = "Calling"
Call SquareTheVector(FromVector, _
ToVector, MyErrorCode)
If MyErrorCode = "OK" Then
For I = 1 To VectorLength
ToRange.Cells(I).Value = ToVector(I)
Next I
Else
MsgBox "Failed in cmdSquareReversed." & _
"Error: " & MyErrorCode
End If
End Sub
64CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.41
Code for SquareTheVector
Sub SquareTheVector(StartArray, _
ReturnArray, ErrorCode As String)
Dim intStartLower As Integer
Dim intStartUpper As Integer
Dim intReturnLower As Integer
Dim intReturnUpper As Integer
Dim IsOK As Boolean
Dim I As Integer
IsOK = False
intStartLower = LBound(StartArray, 1)
intStartUpper = UBound(StartArray, 1)
intReturnLower = LBound(ReturnArray, 1)
intReturnUpper = UBound(ReturnArray, 1)
If ((intStartUpper - intStartLower) = _
(intReturnUpper - intReturnLower)) Then
IsOK = True
Else
ErrorCode = "Not OK"
Exit Sub
End If
6.0.42
(con't.)
For I = 1 To (intStartUpper - _
intStartLower + 1)
If (Not IsNumeric(StartArray(intStartLower _
- 1 + I))) Then
MsgBox "Error code 303. Bibi."
Exit Sub
End If
ReturnArray(intReturnLower - 1 + I) = _
StartArray(intStartLower - 1 + I) ^ 2
Next I
ErrorCode = "OK"
End Sub
6.0. NOTES 65
6.0.43
Commnets
² Code is perhaps more complex than is strictly required.
² But, error-checking is awfully important and there could be more of ithere. (How? Why?)
² Also, illustrates generality (e.g., arrays must be conformable, but need nothave the same indexing arrangement)
² Useful generally for programming in VBA and for programming generally:drop things into arrays, pass the arrays around and use them to recordcomputations and for looping through when you do computations.
6.0.44
Goals for lecture 5.
1. Introduce the VBA/Excel debugger.
2. Introduce forms programming.
3. Discuss VBATutorSortem.xls, a spreadsheet for sorting addresses.
66CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.45
Code in Module1 of Excel Workbook,VBATutorSortem.xls
Option Explicit
Sub MakeAddressesHorizontal(StartRow _
As Integer, _
StopRow As Integer, DaColumn As Integer, _
DaSheet As String)
Dim intDaColumn As Integer
Dim I As Integer
intDaColumn = DaColumn
Dim Temp
Dim strDaSheet As String
strDaSheet = DaSheet
Dim intDaStartRow As Integer
intDaStartRow = StartRow
6.0.46
Code in Module1
Dim intAddressRow As Integer
Dim intAddressStartRow As Integer
Dim boolDone As Boolean
boolDone = False
'intDaStartDow is the first row in which
'an address lies.
'intAddressStartRow = the absolute
'row number in which a single address
'begins
'intAddressRow = the current (relative)
'row of the address (reset to 1 for the
'top of each address.
intAddressStartRow = intDaStartRow
6.0. NOTES 67
6.0.47
Code in Module1
With Worksheets(strDaSheet)
intAddressRow = 1
Do While Not boolDone
'Do the next address
'if we have two empty rows, we're done.
If (.Cells(intAddressRow + _
intAddressStartRow, _
intDaColumn).Value = "" And _
.Cells(intAddressRow + _
intAddressStartRow _
+ 1, intDaColumn).Value = "") Then
boolDone = True
StopRow = intAddressRow + _
intAddressStartRow + 1
End If
6.0.48
Code in Module1
'if we have only one empty row,
'we start a new address
If .Cells(intAddressRow + _
intAddressStartRow, _
intDaColumn).Value = "" Then
intAddressStartRow = intAddressRow + _
intAddressStartRow + 1
intAddressRow = 1
End If
68CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.49
Code in Module1
.Cells(intAddressRow + intAddressStartRow, _
intDaColumn).Select
Selection.Copy
.Cells(intAddressStartRow, intDaColumn + _
intAddressRow).Select
ActiveSheet.Paste
.Cells(intAddressRow + intAddressStartRow, _
intDaColumn).Select
Application.CutCopyMode = False
Selection.ClearContents
intAddressRow = intAddressRow + 1
Loop
End With
End Sub
6.0. NOTES 69
6.0.50
Code in Module1
Sub SortHorizontalAddresses(StartRow _
As Integer, StopRow As Integer, _
DaColumn As Integer, DaSheet As String)
Dim Top As Range
Dim Bottom As Range
Set Top = _
Worksheets(DaSheet).Cells(StartRow, _
DaColumn)
Set Bottom = _
Worksheets(DaSheet).Cells(StopRow, _
DaColumn + 4)
Range(Top, Bottom).Select
Selection.Sort _
Key1:=Cells(StartRow, DaColumn), _
Order1:=xlAscending, Header:=xlGuess, _
OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom
Cells.Select
Selection.Columns.AutoFit
Range("A1").Select
End Sub
6.0.51
Code in Module1
Sub StartToSort()
frmSortem.Show
End Sub
70CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.52
The button code for Sheet2
Private Sub CommandButton1_Click()
Dim DaStartRow As Integer
Dim DaStopRow As Integer
Dim DaMainColumn As Integer
Dim OurDataSheet As String
OurDataSheet = "sheet2"
DaStartRow = 3
DaMainColumn = 2
DaStopRow = 0
Call MakeAddressesHorizontal(DaStartRow, _
DaStopRow, DaMainColumn, OurDataSheet)
Call SortHorizontalAddresses(DaStartRow, _
DaStopRow, DaMainColumn, OurDataSheet)
End Sub
6.0.53
The button code for Sheet1
Private Sub CommandButton1_Click()
StartToSort
End Sub
6.0.54
Code for the Form (\frmSortem") buttons
Private Sub cmdCancel_Click()
Unload frmSortem
End Sub
6.0. NOTES 71
6.0.55
Code for the Form (\frmSortem") buttons (con't.)
Private Sub cmdOK_Click()
Dim DaSheetName As String
Dim DaStartRow As Integer
Dim DaMainColumn As Integer
Dim DaStopRow As Integer
DaSheetName = txtSheetName.Value
DaStartRow = txtStartRow.Value
DaMainColumn = txtStartColumn.Value
Application.ScreenUpdating = False
Sheets(DaSheetName).Select
Call MakeAddressesHorizontal(DaStartRow, _
DaStopRow, DaMainColumn, DaSheetName)
Call SortHorizontalAddresses(DaStartRow, _
DaStopRow, DaMainColumn, DaSheetName)
End Sub
6.0.56
VB List Boxes
² To read values into a list box you must point to the values in the \Initial-ize" part of the code. You would do this to provide a list of options a usermay then select from.
² Select or activate the sheet containing those values before showing thedialog box or user form.
72CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.0.57
VB List Boxes con't.
² The code to assign the selection(s) to variables should be placed in the\Click" part of the code.
In the design mode, right click on the \OK" or some such button andselect \View Code." This is where variables are assigned values after abutton is clicked by the user.
² You can call other subs from anywhere in the ¯le to run from a form.However, they must be called from the \Click" part of the code to berecognized.
6.0.58
VB List Boxes Tips
² Scroll bars appear/disappear automatically depending on the size of thebox and the values read into the box (e.g., if you have a list of 100 itemsand design the box to be 2 inches long, a vertical scroll bar will appear).
² Set \Integral Height" to false on the properties of a text box if you wantthe size of the box to remain ¯xed.
6.0.59
Stepping through Code
² Helpful for debugging
² From the VB editor, select
Tools j Macro j Step Into
² Yellow bar highlights code about to be executed
² Move cursor over code already executed to see values assigned to variables
6.0. NOTES 73
6.0.60
General tips
² Use the macro recorder and then add control statements and variables.
² Document all code thoroughly so that you can follow what has been done,if it is not already obvious.
² Use meaningful variable (and procedure) names (e.g., \Year1" is a goodname for a starting year, but \A" is not).
² Type all code in lower case so VB can automatically capitalize letters tobetter ensure you do not have typographical errors.
74CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
Figure 6.1: Contents Tab in the Help Menu for VBA
6.1. FIRST STEPS 75
6.1 First Steps
Visual Basic for Applications (VBA) is the macro language for Excel. It closelyresembles Visual Basic, an independent language from Microsoft, and is usedas the macro language for Microsoft Access and Word. In what follows, we willbe talking for the most part about Visual Basic for Applications as it applies toExcel. We will feel free to call it VBA, EVB, Visual Basic, VB, etc., so long asthe context makes confusion unnecessary.
Macros consist of one or more VBA code chunks. These code chunks|procedures|are either functions or subroutines. Here are some simple examples.
' Here is a simple function. Use as any other Excel function.
Function bob(x)
bob = x ^ 2 + 3.34
End Function
' Here is a simple VBA sub.
Sub ted()
MsgBox "Hello, world!"
End Sub
Note: comments begin with a single quote:
' Everything afterwards in the line is ignored.
In Excel, VBA macros reside on special workbook sheets, called modules.To make a macro, one may simply create a new macro module and type in thefunctions and procedures. More on this shortly.
Information about VBA is published in many readily-available sources. BothMicrosoft and third-parties publish extensive reference manuals and how-tobooks for VBA. In addition, VBA closely resembles Visual Basic and thereis a large literature on that. For good online help on VBA in Excel, explore\Programming with Visual Basic" in the \Contents" window of the MS Excelhelp facility (see Figures 6.1 (page 74) and 6.2 (page 76)). We are assuming inthese notes that the reader will do this.
76CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
Figure 6.2: Index Tab in the Help Menu for VBA
6.2 Second Steps
6.2.1 Recording Macros
Macros (VBA procedures) can be recorded. Use Record Macro under the Toolsmenu. After selecting Record New Macro, you will be prompted for the nameof this new macro. Either give it a new name, or accept the default. A smallwindow will then appear with a stop button in it. You click the stop buttonwhen you are done recording your macro. First, however, perform as usual someaction in the workbook, e.g., copy one range of cells to another place. When youare done, stop the macro recorder by clicking the stop button. In sum, there isa four-step process to record a macro:
1. Start the macro recorder.
Do this by selecting the menu: Tools / Record Macro / Record NewMacro.
2. Name the macro.
You will be prompted for a name and may accept the default presented
6.2. SECOND STEPS 77
by Excel, e.g., Macro1. Once you have done this, a window appears witha button for stopping the recording of the marco.
3. Record the macro by performing normal activities in the workbook.
It is wise to plan these out before starting to record.
4. Stop recording the macro.
Do this by clicking the stop macro button.
This creates VBA code in a (usually new) module sheet, which Excel willcall Module1 or some such thing. Module sheets reside with the other sheets ofthe workbook. As with the other sheets, you click on the tab to view the modulesheet. When it appears, you will see VBA code against a blank background.While worksheets present spreadsheets (arrays of cells), macro sheets presenttext editors. Thus, you can examine and edit the VBA code.
Notice, in particulate, a couple of things with regard to your new macromodule sheet. First, macro sheets come with a context-sensitive text editor.For example, comments (lines beginning with an apostophe) come out green (bydefault) and reserved words come out blue and get capitalized automatically.Second, the new macro that you just recorded is a Sub, rather than a Function.
Recording macros and examining the results is a good way of learning aboutVB, but it takes you only so far. We need to go further.
6.2.2 Assigning a Macro to a Button or Graphic Object
In order to run (or execute) a Sub macro, including macros created with theRecord New Macro facility in Excel, one can choose to assign the macro to agraphic object that can call the macro. Assigning a macro to a button or graphicobject is easy. For a previously-existing object, select it (e.g., hold down theCtrl key and click on the button or graphic object), then choose Assign Macro...from the Tools menu. You will be prompted with a list of existing Subs and youmake your choice from the list. That done, you may now simply click on thegraphic object and Excel will call the macro and cause it to be executed.
Note: Typically, you will want to create a new button and assign the macroto it. Use Create Button from the Drawing icon and draw a new button. Excelwill automatically prompt you to assign a macro.
6.2.3 Functions versus Subs
VBA functions return values (one value each), but cannot take actions otherwise.VBA subs (subroutines) do not return values, but can take actions. (However,VBA subs can set the values of variables and these variables may be accessedby other procedures.) VBA functions, once de¯ned in a macro sheet, may beused in worksheet cells just as any of the functions Excel has built into it.
Functions and subs may call one another, thus you may create very complexprograms in VBA. We will discuss that later. First things ¯rst. Now, let's lookat variables in VBA.
78CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
6.3 Variables
6.3.1 The Very Basics of Variables
Here is a simple example involving VBA variables:
' Here's an example of two variables in use, along with
' a For...Next... control loop
Sub variableExample1()
' Assign the number 3 to the variable, MyVariable
' Note: You make up your own, nmemonic,
' names for your variables
MyVariable = 3
For i = 1 To MyVariable
MsgBox "Showing and counting: " & i
Next i
End Sub
The two variables are: MyVariable and I.Such program variables are used extensively in this sort of programming.
Variables hold values and their values may change during program execution.Basically, you make computations and assign the results to variables. Then youmake new computations, based on the assigned values of these variables, andyou assign the results to other variables. And on and on.
6.3.2 Variables Have Data Types
Some variables are for holding numbers, some for text, some for dates, and soon. VBA has a special type of variable, called the variant type. It can holdabout anything, but in general you should avoid being so loose.
The main data types in VB are
1. Boolean. Values: True or False
2. Integer. Values: -32,768 to 32,767
3. Long (integer). Values: -2,147,483,648 to 2,147,483,647
4. Single (single precision °oating point). Values: [lots]
5. Double. Values: [lots more than singles]
6. Currency. Values: [lots]
7. Date. Values: January 1, 0100 through December 31, 9999
8. String. Values: 0 through 65,535 characters
6.3. VARIABLES 79
9. Variant. Values: Any numeric value thru Double or any character text
You set the data type of a VB variable by declaring it. But, if you don'tdeclare the data type for a variable (as in the variableexample1 procedure,above), then the default is that the variable is of type variant.
Within a procedure, you may declaire variables with the Dim (dimension)statement.
' Now here's variableExample1 again, but
' with the variables properly declared
Sub variableExample2()
' Assign the number 3 to the variable, MyVariable
' Note: You make up your own, nmemonic,
' names for your variables
Dim MyVariable As Integer
Dim I As Integer
MyVariable = 3
For I = 1 To MyVariable
MsgBox "Showing and counting: " & I
Next I
End Sub
6.3.3 Local and Global Variables
Variables declared this way (explicitly in a procedure with Dim or as variantby default) are local to the procedure. That is, you can't refer to them|usetheir names and get their values|in other procedures. In fact, as illustratedin variableexample1 and variableexample2, above, you can actually reusethe same variable names in di®erent procedures. When you do this, you arereally working with di®erent variables, which happen to have the same names.(Advice: except for counters, like I, and explicitly temporary variables, e.g.,mytemp, don't do this.)
Point of style: It is normally considered good programming practice to de-clare all your variables explicitly. Why? In Visual Basic, you can enforce thisby declaring
Option Explicit
in the declarations section of each code module. (The declarations section ofa module is the space before the ¯rst procedure{i.e., at the top.) You shoulddo this. Then, when VB encounters a variable that hasn't been declared, VBgenerates an error message. This may initially be irritating, but it's a very goodidea in the long run, since it prevents otherwise undetected errors.
The scope of a variable need not be limited to being local, however. In VBAin Excel, the scope of a variable may be the procedure in which it is declared (inwhich case we say it is local), the module in which it is declared, or the entireworkbook.
80CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
When the scope is to be local (within procedure only), declare variables atthe beginning of the procedure with the Dim statement. (See also in the Helpfacility: the Static statement.) See examples above, procedures variableExample1and variableExample2.
When the scope of a variable is to be the module in which it is declared,declare the variable at the top of the module (in the declarations section), usingDim. (See also in the Help facility: the Static statement.)
When the scope of a variable is to be the entire workbook, pick a module,and declare the variable in the declarations section using Public (cf., Global).
Here's an example:
' Each module begins with a declarations section, the
' portion at the top, before the procedure declarations
' begin.
' Declare explicit data type checking
Option Explicit
Public MyVar As Integer
Sub publicExample1()
MyVar = 17
MsgBox "We're in publicexample1 and MyVar = " & MyVar
'publicexample2
End Sub
Sub publicExample2()
MsgBox "We're in publicExample2 and MyVar = " & MyVar
End Sub
Note: \Module-level variables remain in existence while Visual Basic is run-ning until the module in which they are de¯ned is edited" (Visual Basic User'sGuide, Microsoft Excel 5.0, p. 121). So play around with this example and seehow this stu® works.
6.3.4 Reading from an Excel worksheet into an Excel Vi-sual Basic Variable
Study these examples:
Sub readfromworksheet1()
Dim fromworksheet
' Note that with Cells(1,2) we are referencing
' the first row and second column of the worksheet.
fromworksheet = Worksheets("Sheet1").Cells(1, 2).Value
' The following line works just as well.
'fromworksheet = Worksheets("Sheet1").Range("b1").Value
6.4. BOOLEAN OPERATORS 81
MsgBox "We're in readfromworksheet1 and fromworksheet = " & _
fromworksheet
' Note above, use of "_" as a continuation sign.
End Sub
Sub readfromworksheet2()
' Now assume we have defined a range, called testrange1,
' whose
' scope is B2:D4
Dim fromworksheet
' Note that with Cells(1,1) we are referencing
' the first row and first column of the named range.
fromworksheet = Range("testrange1").Cells(1, 1).Value
' The following line works just as well.
'fromworksheet = Worksheets("Sheet1").Range("b1").Value
MsgBox "We're in readfromworksheet2. fromworksheet = " & _
fromworksheet
' Note above, use of "_" as a continuation sign.
End Sub
6.3.5 Writing from an Excel Visual Basic Variable to aWorksheet
Just switch from left to right, e.g.,
Worksheets("Sheet1").Cells(1, 2).Value = fromworksheet
The equal sign, =, in this context is an assignment statement. It puts the stu®on the right into the stu® on the left.
6.4 Boolean Operators
Often we have to test for the truth or falsity of an expression, for example
MyVar > 7.3
will be true if MyVar has a value that is greater than 7.3. If its value is lessthan 7.3 the expression will be false. Note: If MyVar is Null, then the expressionevaluates to Null. See comparison operators. This greatly complicates thingsand in these notes, I'll ignore the question of nulls.
So, expressions may be either true or false, in which case we say they havetruth values. Expressions having truth values may be combined using Booleanoperators to yield larger expressions, which also have truth values. The Booleanoperators available in VB are: And, Or, and Not.
Each of these operators has a characteristic truth table, as follows.Interestingly, many other Boolean (truth-functional) operators are possible.
That is, there are a lot more other truth tables possible. But, these three su±ce
82CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
expression1 expression2 (expression1 And expression2)T T TT F FF T FF F F
Table 6.1: Truth Table for And
expression1 expression2 (expression1 Or expression2)T T TT F TF T TF F F
Table 6.2: Truth Table for Or
in that with them any other possible Boolean (truth functional) operator maybe de¯ned. (How would you prove this?) In fact, Not and And are su±cient inthis way, as are Not and Or. Here's something of a proof.
6.5. CONTROL STRUCTURES 83
expression (Not expression)T FF T
Table 6.3: Truth Table for Not
exp1 exp2 (exp1 And exp2) Not(Not exp1 Or Not exp2)T T T TT F F FF T F FF F F F
Table 6.4: Truth Table Showing De¯nition of And in terms of Not and Or
Can you think of a single Boolean operator that is by itself su±cient?So, we often need Boolean combinations of statements (or expressions) in
programming. The bottom line is that And, Or, and Not are su±cient forexpressing anything we can possibly express in this way.
6.5 Control Structures
There are several of these in Visual Basic, and we'll look at a few of them. (Andyou should search the online help under \control structures.") We have alreadyseen one, the For...Next statement.
6.5.1 For...Next
We've already seen this in action (above). The general structure for a For...Nextstatement is:
For <counter> = <start> To <end> [Step <increment>]
[statements]
Next [<counter>]
Note: Items in square brackets, [...], are optional. Items capitalized arerequired parts of the statement. Items between left and right angle brackets,<...>, are required to be ¯lled in by the programmer. Thus, valid examples forthe For...Next statement include the following.
For I = 1 To 3
MsgBox "Hello, world!"
Next
Better style is to do this:
For I = 1 To 3
MsgBox "Hello, world!"
Next I
84CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
Or you can count down, if, e.g., MyIncrement is negative.
For MyCounter = MyStart To MyFinish Step MyIncrement
MsgBox "MyCounter = " & MyCounter
Next MyCounter
Note: Be sure all these variables have reasonable values set for them beforeexecuting this statement.
6.5.2 If...Then...
This is a very useful statement in programming languages. The basic structurein VB is:
If <condition> Then
[statements]
End If
When an If...Then... statement is executed, the <condition> is tested as aBoolean expression. If it evaluates to True, then the [statements] are executed;otherwise they are skipped and processing continues with the next statement,if any.
Note: The <condition> can also be an expression that returns a numericvalue. If when evaluated it returns 0, that is treated as False. Anything else istreated as True.
Example:
If Age >= 65 Then
NumberOfDeductions = NumberOfDeductions + 1
End If
Note: The <condition> expression may be complex. It may be an arbitrarilycomplex Boolean combination of statements.
6.5.3 If...Then...Else
Probably used even more often than If...Then...
If <condition> Then
[statements to execute if <condition> is true]
Else
[statement to execute if <condition> is false]
End If
You use If...Then...Else when you want to do one thing if a condition ob-tains, and another if it does not obtain. The =If(...) function in Excel is anIf...Then...Else type of construct. Example: If the value in a certain cell(or variable) is valid, then display an OK message; otherwise display a not OKmessage.
6.5. CONTROL STRUCTURES 85
6.5.4 Select Case
More general than If...Then...Else is Select Case.
Select Case <test expression>
Case <first expression list>
[first statements]
Case <second expression list>
[second statements]...
Case Else
[else statements]
End Select
Here's an example from a popular Excel/VBA book:
Select Case TotalPoints
Case Is < 50
FinalGrade = "F"
Case Is < 60
FinalGrade = "D"
Case Is < 70
FinalGrade = "C"
Case Is < 80
FinalGrade = "B"
Case Else
FinalGrade = "A"
End Select
This runs, but there's a lot that's wrong with it. The following is much better.Why?
Sub testcase2()
TotalPoints = 173
Select Case TotalPoints
Case 0 To 50
FinalGrade = "F"
Case 50 To 59
FinalGrade = "D"
Case 60 To 69
FinalGrade = "C"
Case 70 To 79
FinalGrade = "B"
Case 80 To 100
FinalGrade = "A"
Case Else
FinalGrade = "Error in TotalPoints: " & TotalPoints
End Select
86CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
MsgBox "Final grade is: " & FinalGrade
End Sub
6.5.5 Do...Loop
There are really two forms of Do...Loop: condition-at-the-top and condition-at-the-bottom. Here they are:
Do {While | Until} <condition>
[statements]
Loop
and
Do
[statements]
Loop {While | Until} <condition>
where
{While | Until}
gets unpacked as either While or Until. While <condition> means so longas the condition is true, and Until <condition> means until the conditionis true. The di®erence between the condition-at-the-top and the condition-at-the-bottom versions lies mainly in that the condition-at-the-bottom version isguaranteed to execute its [statements] at least once.
6.5.6 Exiting a Loop
Sometimes you need to break out of a loop. (Don't we all?) If you're in aFor...Next structure, break out with an Exit For statement. If you're in aDo...Loop, break out with an Exit Do statement. Note: sometimes you haveto do this, but it's generally considered poor programming practice. Why?
6.6 Arrays
Arrays in VB should not be confused with arrays and array commands in Excel,even though Excel's terminology invites this. All standard third-generation pro-gramming languages support arrays, and programs in these languages typicallyrely a lot on arrays. Arrays are rather like vectors and matrices in mathematics.A one-dimensional array is an ordered collection of values, rather like a vector,which you can access (store or retrieve values) by position. Here's a simpleexample.
' From "Code Module5" of vbtutor.xls
Sub arraytester1()
Dim I, MyFirstArray(1 To 6) As Integer
6.7. MISCELLANEOUS TOPICS 87
' Load up the array
For I = 1 To 6
MyFirstArray(I) = I + 3
Next I
MsgBox "MyFirstArray(6) = " & MyFirstArray(6)
' Dump the array into a worksheet
For I = 6 To 1 Step -1
Worksheets("Sheet1").Cells(I, 6).Value = MyFirstArray(I)
Next I
End Sub
Note: You declare an array in much the same way you declare any other variable.(But see ReDim in the online help.) All of the elements in an array must havethe same data type. Of course, if the array is of type variant, this is prettyloose. (But you can't have, e.g., arrays within arrays in VB.)
Here's a more interesting example, using a two-dimensional array.
Sub arraytester2()
Dim I, J As Integer
Dim MySecondArray(1 To 10, 1 To 20) As Single
' Load up the array and dump, forcing
' type conversion from Integer to Single
For I = 1 To 10
For J = 1 To 20
MySecondArray(I, J) = Sin(I + J)
Worksheets("Sheet2").Cells(I, J).Value = MySecondArray(I, J)
Next J
Next I
End Sub
We can go on the high-dimensional arrays, but I think you get the idea. InExcel VB programs, you typically only need one- and two-dimensional (maybethree-dimensional) arrays.
6.7 Miscellaneous Topics
Now we'll discuss a list of useful things, things|methods and tricks|that didn't¯t easily in the previous discussion.
6.7.1 Constants
Constants are like variables, except that they don't change. You use constantsin order to improve the readability of your program and to help reduce errors.For example, if the maximum number of students in a classroom is 132, andyou need this value a lot in your program, then you might want to considerdeclaring a constant. You might do this:
88CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
Public Const maxstudents As Integer = 132
Then, throughout your program, you can just use maxstudents, without havingto worry about typing 132 or making a mistake and typing some other number.(Recall: Option Explicit.)
6.7.2 The Copy Method
Suppose you wish to copy one worksheet range to another worksheet range. Youcan do this in Excel VBA with the copy method. For example:
Sub copytest1()
' Suppose "carol" is the range B3:C4 on Sheet3 and
' "alice" is E4:F5 on Sheet3.
' The following works:
Worksheets("Sheet3").Range("carol").Copy _
destination:=Worksheets("Sheet3").Range("alice")
' And so does this:
Worksheets("Sheet3").Range("carol").Copy _
destination:=Worksheets("Sheet3").Cells(9, 9)
' and so does this:
Worksheets("Sheet3").Range("b3:c4").Copy _
destination:=Worksheets("Sheet3").Cells(10, 2)
End Sub
6.7.3 Referring to Single Column or Row Ranges
Suppose the name denise refers to a range consisting of a single column. ThenRange("denise").Cells(1).Value refers to the value in the topmost cell inthe range.
Sub democells1()
x = Range("denise").Cells(1).Value
y = Range("denise").Cells(2).Value
MsgBox "x = " & x & " and y = " & y
End Sub
6.7.4 Sorting Worksheet Ranges
See the sort method. In Excel VBA you can direct the sorting of a worksheetrange. For example, the following subroutine sorts the range, DaRange, in work-sheet, DaWorkSheet, on the column, DaColumn, in descending order.
Sub sort()
Worksheets("DaWorkSheet").Range("DaRange").sort _
key1:=Range("DaColumn"), order1:=xlDescending
End Sub
6.7. MISCELLANEOUS TOPICS 89
6.7.5 Calling Subroutines from within Other Subroutines
A reasonable and normal thing to do. In fact it's recommended. Supposeyou had a main subroutine, called main, and you wanted it to call three othersubroutines, named mysub1, mysub2, and mysub3. Here's how:
Sub main()
mysub1
mysub2
mysub3
End Sub
6.7.6 Calling Functions from Other Procedures
Very straightforward. See the bob function at the start of this appendix. Then,here's an example.
Function bobagain(x)
bobagain = bob(x) * bob(x)
End Function
6.7.7 Selecting Ranges
In RangeTest, we illustrate how to select a worksheet range based on the rowand column indexes of the cells in the corners of the range. Then, we callSimpleChartRange, passing it a variable whose value has been set to a givenrange. SimpleChartRange then produces a (simple) chart based on the valuesin the range passed to it. I began by recording a macro for SimpleChartRangeand then I modi¯ed it to accept and use the range passed to it.
Option Explicit
Sub RangeTest()
'Assume that we have some numbers in
'A1:E1 in Sheet1 of the workbook. Also
'on that sheet we have a button, that
'when clicked calls this sub.
'Begin by declaring MyRange as a Range
'variable.
Dim MyRange As Range
'How to select a range based on the
'row and column indexes of the corner
'cells in the range?
'Here is one way. The corner cells are
'Cells(1,1)--A1--and Cells(1,5)---E1. Nice, huh?
Set MyRange = _
Worksheets("sheet1").Range(Cells(1, 1), _
90CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
Cells(1, 5))
'Now we put up a message box just to test
'to see that all went well.
MsgBox MyRange.Cells(1).Value
'Now we select the range. This is
'another check on the code.
MyRange.Select
'Now we call a sub that will do
'a simple chart on the range.
Call SimpleChartRange(MyRange)
End Sub
Sub SimpleChartRange(TheRange As Range)
'This sub is a minor modification of
'a simple macro I recorded to chart
'a series of numbers in the range A1:E1,
'on Sheet1 of a workbook.
'The big difference is that I'm
'passing in a range variable, called
'TheRange and using it where the
'original macro used, e.g., "a1:e1".
'You might also refer to slides 36-40
'of the VBA tutorial.
TheRange.Select
Charts.Add
ActiveChart.ChartType = xlLineMarkers
ActiveChart.SetSourceData _
Source:=TheRange, _
PlotBy:=xlRows
ActiveChart.Location _
Where:=xlLocationAsObject, _
Name:="Sheet1"
With ActiveChart
.HasTitle = False
.Axes(xlCategory, xlPrimary).HasTitle = False
.Axes(xlValue, xlPrimary).HasTitle = False
End With
Application.CommandBars("chart").Visible = False
End Sub
6.7.8 The Month Function
The VBA/Excel function, Month will return the number (integer) correspondingto the month in the date given the function. This could be used to ¯gure outwhen the months change in the data.
6.8. BIBLIOGRAPHIC NOTE 91
Sub DaMonth()
'Assume that cell A5 on Sheet1
'contains a value that is a date,
'e.g., 4/15/98.
'The month function will return
'an integer corresponding to the
'month in the date.
Dim OurMonth As Integer
OurMonth = Month(Worksheets("sheet1").Range("a5").Value)
MsgBox "And the month is " & OurMonth & "."
End Sub
6.7.9 Writing to the Status Bar
Here's code for putting a message on the Excel status bar.
Sub PokeTheStatusBar()
Application.StatusBar = "Macintosh forever!"
End Sub
6.8 Bibliographic Note
A good, elementary introduction to Excel VBA (but not to programming itself)can be found in [13, page 205].
6.9 Verson Notes
File: dt-vbatutor. Created: 951128, from VBTUTORF.DOC. Revised: 951222,19980502, 19980512.
92CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL
Chapter 7
VBA Exercises
1. Consider the following code, Sub Question9.
Sub Question9()
Dim I As Integer
Dim J As Integer
For I = 5 To 1 Step -1
For J = 2 To 6 Step 2
Worksheets("Question9").Cells(J, I).Value = (I / J) ^ 3
Next J
Next I
End Sub
Assuming that the worksheet \Question9" is empty before the Sub Ques-tion9 is executed, what is in cell D4 after Sub Question9 is executed?
(a) D4 is empty
(b) 15.625
(c) 1
(d) 0.125
(e) 8
2. Suppose that the worksheet \Question10" has the numbers 1, 2 and 3 inthe range A1:C1 (i.e., A1 has a 1 in it, B1 has a 2 in it, C1 has 3) and2, 4, and 6 in the range A2:C2 (i.e., A2 = 2, B2 = 4, C2 = 6). Supposefurther that otherwise the worksheet is empty. What is in cell A5 afterthe Sub Question10, below, is executed?
Sub Question10()
Dim I As Integer
Dim Huh As Double
93
94 CHAPTER 7. VBA EXERCISES
Huh = 2
For I = 1 To 3
Huh = Huh + Worksheets("Question10").Cells(1, I).Value _
* Worksheets("Question10").Cells(2, I).Value
Next I
Worksheets("Question10").Range("A5").Value = Huh
End Sub
(a) 56
(b) 28
(c) 0
(d) 54
(e) 30
3. Which of the following is NOT a valid data type in Visual Basic for Ap-plications?
(a) Double
(b) Currency
(c) Short
(d) String
(e) Boolean
4. Suppose in worksheet \Question12" A1 = 12, A2 = 3.4, A3 = 3, and other-wise the worksheet is empty. What is in cell B2 after the Sub Question12,below, is executed?
Sub Question12()
Dim I As Integer
For I = 1 To 3
Worksheets("Question12").Cells(3 - I + 1, 2).Value = _
Worksheets("Question12").Cells(I, 1).Value
Next I
End Sub
(a) 3.4
(b) The cell remains empty
(c) 12
(d) 3
(e) None of the above
5. Suppose the worksheet \Question13" has the following values in the indi-cated cells in Figure 7.1 (e.g., B2 = 0) and is otherwise empty. What isin cell D8 after the Sub Question13, below, is executed?
95
A B
1 4 12 3 03 2 -14 1 -25 0 -36 -1 -47 -2 -58 -3 -69 -4 -710 -5 -811 -6 -912 -7 -10
Figure 7.1: Table for the sub \Question13"
Sub Question13()
Dim I As Integer: Dim AnArray() As Variant
Dim Count As Integer: Dim Response As Variant
Count = Worksheets("Question13").Cells(1, 1).Value
ReDim AnArray(1 To Count)
For I = 1 To Count
AnArray(I) = Worksheets("Question13").Cells(I + 6, 2).Value
Next I
For I = 1 To Count
Worksheets("Question13").Cells(I + 6, 4).Value _
= AnArray(Count - I + 1)
Next I
End Sub
(a) -8
(b) 0
(c) -7
(d) -1
(e) The cell remains empty
6. Suppose the worksheet \Question14" has the values in the indicated cellsin Figure 7.1 (above, question 5, e.g., B2 = 0) and is otherwise empty.What is in cell C1 after the Sub Question14, below, is executed?
Sub Question14()
96 CHAPTER 7. VBA EXERCISES
Dim I As Integer
I = Worksheets("Question14").Cells(2, 1).Value
Worksheets("Question14").Cells(1, 3).Value = _
Worksheets("Question14").Cells(I + 1, I - 1).Value
End Sub
(a) 1
(b) -1
(c) 2
(d) -2
(e) None of the above
7. Suppose the worksheet \Question15" has the values in the indicated cellsin Figure 7.1 (above, question 5, e.g., B2 = 0) and is otherwise empty.What is in cell C1 after the Sub Question15, below, is executed?
Sub Question15()
Dim I As Integer
Dim J As Integer
Dim K As Integer
I = Worksheets("Question15").Cells(J + 2, K + 1).Value ^ 2
J = Worksheets("Question15").Cells(I, 1).Value
Worksheets("Question15").Cells(1, 3).Value = _
Worksheets("Question15").Cells(I - J, 2).Value
End Sub
(a) -7
(b) 0
(c) 4
(d) The cell remains empty
(e) None of the above
8. Suppose the worksheet \Question16" has the values in the indicated cellsin Figure 7.1 (above, question 5, e.g., B2 = 0) and is otherwise empty.What is in cell C3 after the Sub Question16, below, is executed?
Sub Question16()
Dim I As Integer
Dim J As Integer
Dim K As Integer
Dim L As Integer
Do
I = I + 1
Loop While Worksheets("Question16").Cells(I, 1).Value <> ""
97
Do
J = J + 1
L = Worksheets("Question16").Cells(J, 1).Value
K = Worksheets("Question16").Cells(J, 2).Value
Loop Until L + K < 2
For L = 1 To J
Worksheets("Question16").Cells(L, 3).Value = _
Worksheets("Question16").Cells(L, 1).Value
Worksheets("Question16").Cells(L, 4).Value = _
Worksheets("Question16").Cells(L, 2).Value
Next L
For L = J + 1 To I
Worksheets("Question16").Cells(L, 3).Value = _
Worksheets("Question16").Cells(L, 2).Value
Worksheets("Question16").Cells(L, 4).Value = _
Worksheets("Question16").Cells(L, 1).Value
Next L
End Sub
(a) 2
(b) -1
(c) 4
(d) 0
(e) 1
9. Suppose the worksheet \Question16" has the values in the indicated cellsin Figure 7.1 (above, question 5, e.g., B2 = 0) and is otherwise empty.What is in cell D12 after the Sub Question16, above, is executed?
(a) -6
(b) -9
(c) The cell is empty
(d) -7
(e) -10
10. Excel supports text box validation on Dialog boxes. Which data type doesit NOT support for this purpose?
(a) Reference
(b) Text
(c) Number
(d) Integer
(e) Date
98 CHAPTER 7. VBA EXERCISES
11. Excel supports a number of controls on Dialog boxes. Which control typedoes it NOT support for this purpose?
(a) Text Box
(b) Check Box
(c) Scroll Bar
(d) Slider
(e) Spinner
12. TBA
13. TBA
14. TBA
A B C D E
1 1 1 0 1 02 1 0 1 0 1
Table 7.1: Input data for Questions 15 and 16
15. Suppose that worksheet \Question15" holds input data as indicated inTable 7.1, and that the sub Question15 (Figure 7.2, page 99) is executed.What appears in the range A4:E5 of the worksheet \Question15"?
(a)
A B C D E
4 1 0 1 0 15 1 1 0 1 0
(b)
A B C D E
4 1 1 1 0 15 1 0 0 1 0
(c)
A B C D E
4 1 1 0 0 15 1 0 1 1 0
(d)
A B C D E
4 1 1 0 1 15 1 0 1 0 0
(e) None of the above.
Hints: (a) See Figure 7.3, page 100, for the de¯nition of the Mod functionin VBA. (b) Perhaps a better name for this sub would be Crossover.
99
Sub Question15()
Dim Top(1 To 5) As Integer
Dim Bot(1 To 5) As Integer
Dim ResultTop(1 To 5) As Integer
Dim ResultBot(1 To 5) As Integer
Dim I As Integer
Dim J As Integer
J = 0
For I = 1 To 5
Top(I) = Worksheets("Question15").Cells(1, I).Value
Bot(I) = Worksheets("Question15").Cells(2, I).Value
J = J + Top(I) + Bot(I)
Next I
J = (J Mod 4) + 1
For I = 1 To J
ResultTop(I) = Top(I)
ResultBot(I) = Bot(I)
Next I
For I = J + 1 To 5
ResultTop(I) = Bot(I)
ResultBot(I) = Top(I)
Next I
For I = 1 To 5
Worksheets("Question15").Cells(4, I).Value = ResultTop(I)
Worksheets("Question15").Cells(5, I).Value = ResultBot(I)
Next I
End Sub
Figure 7.2: Code for Question 15
100 CHAPTER 7. VBA EXERCISES
This example uses the Mod operator to divide two numbers and
return only the remainder. If either number is a floating-point
number, it is first rounded to an integer.
Dim MyResult
MyResult = 10 Mod 5 ' Returns 0.
MyResult = 10 Mod 3 ' Returns 1.
MyResult = 12 Mod 4.3 ' Returns 0.
MyResult = 12.6 Mod 5 ' Returns 3.
Figure 7.3: De¯nitional information on Mod from Micosoft's online help for VBA
16. Suppose that worksheet \Question15" holds input data as indicated inTable 7.1 (page 98), and that the sub Question16 (Figure 7.4, page 100)is executed. What number appears when the resulting message box (fromthe MsgBox J command in the code) is displayed?
(a) 3
(b) 21
(c) 6
(d) 10101
(e) 91
Sub Question16()
Dim Bot(1 To 5) As Integer
Dim I As Integer
Dim J As Integer
J = 0
For I = 1 To 5
Bot(I) = Worksheets("Question15").Cells(2, I).Value
Next I
For I = 1 To 5
J = J + (Bot(6 - I) * 3 ^ (I - 1))
Next I
MsgBox J
End Sub
Figure 7.4: Code for Question 16
101
A B C D E
1 1 0 1 1 02 1 0 1 0 1
Table 7.2: Input data for Question 17
17. Suppose that worksheet \Question17" holds input data as indicated inTable 7.2, and that the sub Question17 (Figure 7.5, page 102) is executed.What appears in the range A4:F4 of the worksheet \Question17"?
(a)A B C D E F
4 0 1 1 1 1 1
(b)A B C D E F
4 0 1 0 1 1 1
(c)A B C D E F
4 1 0 1 1 1 1
(d)A B C D E F
4 1 0 1 0 1 1
(e)A B C D E F
4 1 1 1 1 1 1
Hints: And and Or work as follows:
Exp1 Exp2 Exp1 And Exp2 Exp1 Or Exp2
1 1 1 11 0 0 10 1 0 10 0 0 0
102 CHAPTER 7. VBA EXERCISES
Sub Question17()
Dim Top(1 To 5) As Integer
Dim Bot(1 To 5) As Integer
Dim Result(1 To 6) As Integer
Dim I As Integer
Dim J As Integer
Dim Carry As Integer
Carry = 0
For I = 1 To 5
Top(I) = Worksheets("Question17").Cells(1, I).Value
Bot(I) = Worksheets("Question17").Cells(2, I).Value
Next I
For I = 5 To 1 Step -1
Result(I + 1) = (Top(I) Or Bot(I)) Or Carry
Carry = (Top(I) And Bot(I)) And Carry
Next I
Result(1) = Carry
For I = 1 To 6
Worksheets("Question17").Cells(4, I).Value = Result(I)
Next I
End Sub
Figure 7.5: Code for Question 17
103
18. Suppose you want to have a function that, when passed a one-dimensionalarray holding °oating point numbers, would return the largest number inthe array. For example, suppose this function is called Question18 and iscalled from the following sub:
Sub TestQ18()
Dim Vector(1 To 4) As Double
Vector(1) = 5
Vector(2) = 17
Vector(3) = 34
Vector(4) = 4
Dim Result As Double
Result = Question18(Vector)
MsgBox Result
End Sub
The value displayed after executing the MsgBox Result command wouldbe 34.
Suppose, further that someone has provided part of the required function,Question18, and it is displayed|with absent code indicated|in Figure7.6, page 105.
104 CHAPTER 7. VBA EXERCISES
By way of answering this question, Question 18, which one of the followingVBA code segments is best for ¯lling in the missing lines of the code inFigure 7.6, page 105?
(a) MyMax = Max(Vector)
(b) MyMax = 0
For I = Lower To Upper
If MyMax = Vector(I) Then
MyMax = Vector(I)
End If
Next I
(c) MyMax = 0
For I = Lower To Upper
If MyMax < Vector(I) Then
MyMax = Vector(I)
End If
Next I
(d) MyMax = 0
For I = Lower To Upper
If MyMax > Vector(I) Then
MyMax = Vector(I)
End If
Next I
(e) None of the above.
105
Function Question18(Vector) As Double
Dim Lower As Integer
Dim Upper As Integer
Dim I As Integer
Dim MyMax As Double
Lower = LBound(Vector)
Upper = UBound(Vector)
**** Missing line(s) of VBA code go here. ****
Question18 = MyMax
End Function
Figure 7.6: Code for Question 18
106 CHAPTER 7. VBA EXERCISES
³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³
a
c
b
Figure 7.7: Illustration of Pythagorean theorem. The triangle has three sides:a; b; c. The angle between sides a and b is 90±. The lengths of the sides are asfollows: length a = jjajj, length b = jjbjj, length c = jjcjj. According, then, tothe Pythagorean theorem, jjcjj2 = jjajj2 + jjbjj2.
19. Suppose that we need a function, to be called EDistance2D, for calculatingthe (Euclidean) distance between two points in a plane (hence \2D").Let the points be (x1; x2) and (y1; y2) (these are coordinates in the 2-dimensional plane, e.g., (4; 5:6)). The function is to be given these fournumbers and is to return the distance between the corresponding twopoints.
It may be helpful to recall the Pythagorean theorem, as it pertains to thehypotenuse of a right (90±) triangle. See Figure 7.7, page 106.
It will also be helpful to recall that Sqr is the square root function inVBA. Also, you have been provided with a partial answer to the question,in the form of a code template. See Figure 7.8, page 107.
107
Function EDistance2D(x1, x2, y1, y2) As Double
'The two points are (x1,x2) and (y1,y2)
**** Missing line(s) of VBA code go here. ****
End Function
Figure 7.8: Code template for Question 19
By way of answering Question 19, which one of the following VBA codesegments is best for ¯lling in the missing line of the code in Figure 7.8,page 107?
(a) Sqr(EDistance2D) = (x1 - y1) ^ 2 + (x2 - y2) ^ 2
(b) EDistance2D = Sqr((x1 - y1) ^ 2 + (x2 - y2) ^ 2)
(c) EDistance2D = Sqr((x1 - y2) ^ 2 + (y1 - x2) ^ 2)
(d) Sqr(EDistance2D) = (x1 - x2) ^ 2 + (y1 - y2) ^ 2
(e) EDistance2D = Sqr((x1 - x2) ^ 2 + (y1 - y2) ^ 2)
108 CHAPTER 7. VBA EXERCISES
Function Question20(Lower, Upper) As Double
**** Missing line(s) of VBA code go here. ****
End Function
Figure 7.9: Code template for Question 20
20. Suppose we need a function, call it Question20, that returns a real num-ber (a double) randomly and uniformly distributed between the valuesLower and Upper, which we supply to the function when we call it. VBAhas a built-in function, Rnd, that returns a double that is randomly anduniformly distributed between 0 and 1. So, what we need is a general-ization of Rnd, and we are free to use Rnd in writing this new function.Happily, someone has supplied a partial answer to this question, in theform of a code template. See Figure 7.9, page 108.
By way of answering this question, Question 20, which one of the followingVBA code segments is best for ¯lling in the missing line of the code inFigure 7.9, page 108?
(a) Question20 = (Rnd * (Upper - Lower)) + Lower
(b) Question20 = (Rnd * (Upper + Lower)) - Lower
(c) Question20 = (Rnd * Upper) + Lower
(d) Question20 = Rnd * (Upper + Lower)
(e) Question20 = Rnd * (Upper - Lower)
Chapter 8
Text and PatternProcessing
8.1 The information extraction problem
Consider the following chunk of HTML.
<table border="0" width="100%" cellspacing="0" cellpadding="0">
<!-- WP Market Indices Start -->
<div align="center"><i class="smaller">(As of 10:55 AM on 12/20/01)</i></div><br><tr>
<td><p>DJIA</p></td>
<td align="right"><p> 10031.90</p></td>
<td align="right"><p> <span class="marketdown">-38.60</span></p></td>
</tr>
<tr>
<td><p>NASDAQ</p></td><td align="right"><p> 1950.90</p></td>
<td align="right"><p> <span class="marketdown">-31.90</span></p></td>
</tr>
<tr>
<td><p>NYSE</p></td>
<td align="right"><p> 584.90</p></td>
<td align="right"><p> <span class="marketdown">-0.20</span></p></td>
</tr><tr>
<td><p>S&P 500</p></td>
<td align="right"><p> 1145.96</p></td>
<td align="right"><p> <span class="marketdown">-3.60</span></p></td>
</tr>
<tr>
<td><p>AMEX</p></td>
<td align="right"><p> 828.74</p></td><td align="right"><p> <span class="marketdown">-2.11</span></p></td>
</tr>
<!-- WP Market Indices End -->
</table>
This chunk was extracted from a Web page, which was downloaded from the
109
110 CHAPTER 8. TEXT AND PATTERN PROCESSING
AT&T Web service page at http://www.att.net/ shortly after 11 a.m. (EasternStandard Time) on December 20, 2001. The original source page, of course,includes a great deal more HTML code, but that turns out not to matter forour present purposes. Seeing why is an important part of these purposes.
Our interest is in extracting|under program control|certain informationpresented in this page. Speci¯cally, assume ¯rst that we are interested in cap-turing the value of the S&P 500 stock index, as reported on this page. We cansee that the value is 1145.96. How do we get a program to see it (and then dosomething useful automatically for us)?
This very special problem is in fact just a speci¯c case of a very general andwidely-encountered problem:
² The information extraction problem: Given a body of text, how canwe automatically recover useful information from it?
Note that if we can automatically recover useful information, we can then|automatically or not|process that information and obtain additional value. Asimple example: Using just the information in the above chunk of HTML wemight compute which stock index has undergone the largest change, as measuredin percentage.
Usually the information extraction problem presents itself with the followingcomplication:
² Complication to the information extraction problem: The bodyof text containing the information of interest changes over time.
The values of the quantities we are interested in change. Some go up, some godown.
What can a programming language do to help us with the information ex-traction problem? Think of how you might ¯nd the reported value of the S&P500. You look for a pattern of characters, or a string, that indicates where theinformation|speci¯cally, the string|is that you want. The following exerpthas what we want:
<td><p>S&P 500</p></td>
<td align="right"><p> 1145.96</p></td>
How can we describe the pattern? Try this:
Literally, the string "S&P 500", followed by a bunch of non-numeric junk, followed by a decimal number with two digits to theright and several digits on the left. That decimal number is the valuewe want.
Note: We might be reasonably con¯dent that the S&P 500 index will alwayshave exactly four digits to the left of its index. Still, prudence counsels us toallow three to ¯ve.
This is all well and good, but how can we program information extraction,so that it occurs automatically? Regular expressions (REs) let us solve thisproblem. Let us see how.
8.2. REGULAR EXPRESSIONS (RES) 111
8.2 Regular Expressions (REs)
The general idea is to have a language in which we can express patterns (ofstrings or text), which can be matched automatically to a given string (or text).This is a familiar idea to all those who use word processing and are at leastacquainted with using computers. In any Web browser, in Word, and in manyother programs if you type ctrl-f you will get a dialog box that asks you whatyou want to ¯nd. You type in a sequence of characters (e.g., theory), click theappropriate button, and the program ¯nds the ¯rst exact match to your searchstring (if there is a match). All this is very helpful, but:
1. Once you ¯nd your pattern, you want your program to do something usefulwith it.
Browsers and word processors normally just ¯nd things, then require youthe user to take any required actions.
2. What if you aren't exactly sure of the pattern you are seeking, so that astrictly literal match won't do?
Example: You don't know whether the name is \Smith" or \Smyth", orperhaps \color" might be spelled \colour", or perhaps you think the wordmight be misspelled, or the target text will change over time, but withina predictable pattern. As we saw above, the exact value of a stock indexwill vary, yet we can expect it to ¯t within a stable pattern.
Regular expressions are a device for handling both of these problems. We'lldiscuss the two problems in the context of REs in the next two subsections.
8.2.1 Problem 1: Programmed Matches
REs may be incorporated into programming environments. Python and Perlare particularly known for how well they support REs. Let's look at a simpleexample in Python, in which our target string is the Web page fragment repro-duced above, from AT&T WorldNet, and our search string is "S&P 500".See Figure 8.1 (line numbers have been added).
112 CHAPTER 8. TEXT AND PATTERN PROCESSING
1.>>> import re
2.>>> data = open(r'c:\day\attfragment.html','r').read()
3.>>> len(data)
4.1038
5.>>> sandpmatch = re.search(r'S&P 500',data,re.IGNORECASE)
6.>>> sandpmatch
7.<SRE_Match object at 00F12ED0>
8.>>> sandpmatch.span()
9.(681, 692)
10.>>> sandp = data[681:688]
11.>>> sandp
12.'S&P'
Figure 8.1: Simple re.search Example
Exposition on the lines in the ¯gure:
1. We begin by importing (making available) Python's re module (for regularexpressions).
2. The ¯le we want, holding the target text, is attfragment.html, residingon the C drive under the day directory. In this line we open the ¯le forreading, and we read() its contents into our variable data, which nowholds a string corresponding to the entire contents of the target ¯le. Inopen, the r in r'c:\day... stands for `raw'. In raw mode the backslashesare taken literally. Normally the backslash is an escape character (moreon this shortly) and if we actually want a backslash we have to escape it|with a backslash. Instead of r'c:\day... we would have 'c:\\day...
and so on. Raw mode makes things prettier.
3. Here we use the Python function len to obtain the length in charactersof the string data.
4. Python reports that our ¯le and the corresponding string, data, are 1038characters long.
5. The search method of the re module takes three arguments: a querystring as a regular expression, the target string, and (optionally) °ags toguide the query. The °ag re.IGNORECASE is case-sensitive but it tells thequery engine to match regardless of case. Here, this means that 'S&P500' would match to 's&p 500'.
search pretty much does what its name suggests: it looks through thetarget string to ¯nd the ¯rst match to the query string. search returns amatch object or None, depending on whether it ¯nds a match or not. Notethis unsuccessful search:
>>> sandpmatch2 = re.search(r'S&Q 500',data,re.IGNORECASE)
8.2. REGULAR EXPRESSIONS (RES) 113
>>> sandpmatch2
>>> sandpmatch2 == None
1
>>>
6. In line 6 of the ¯gure we ask for the value of sandpmatch after a successfulsearch.
7. In line 7 Python tells us the search was successful.
8. Given that sandpmatch != None (i.e., that its search was successful),there is some place that it found the match. Python's span method formatch objects gives us where the match begins and where the ¯rst char-acter is that is after the match.
9. Python tells us that the match begins at character 681 and continuesthrough character 691. Character 692 is the ¯rst character after the match.
10. The slice data[681:688] gives us the 7 characters of data beginning withcaracter 681.
11. We ask Python for what was returned from the slice. . .
12. . . . and Python tells us (surprise, surprise) that it is the string "S&P".
8.2.2 Problem 2: Pattern Matches
REs support pattern matching against strings, not just literal matching, as inWeb browsers and as in the example of the previous section. Note that a minorparadox lurks. We want to use query strings to form patterned (nonliteral)matches against target strings. How does the matching program \know" whichstrings are to be taken literally and which are to be taken otherwise? Thesolution principle is obvious: some characters are to be understood as special.They are metacharacters and are not taken literally.
Using metacharacters for pattern matching is familiar to most computerusers. When at the command prompt one types
C:\dir *.xls
you are requesting a list of ¯les in the current directory, ending in ".xls". Theasterisk { "*" { is a metacharacter (in this context). It means \match to 0 ormore characters of any kind." So, dir *.xls means \Display all ¯les ending in`.xls', no matter what comes earlier." Ask yourself: What does dir e*e.xls
mean?The language of regular expressions has a short list of metacharacters (a
dozen or so) and a clever syntax that combine to provide a very powerful patternmatching capability. Practice and examples are required to learn how to usethis capability. In x8.3 we summarize the syntax for purposes of reference andin x8.4 we continue our discussion of problem 2 (pattern matches) in the contextof Python and regular expressions.
114 CHAPTER 8. TEXT AND PATTERN PROCESSING
8.3 Python's RE Syntax
The following list is taken more or less directly from the online Python LibraryReference for the re module. We present it here for the sake of convenienceand completeness. (There is additional information online regarding the re
module.) The reader will likely ¯nd many of the items below unduely arcane,and rarely used. Learning just a few of the syntactic elements will su±ce formost purposes. We o®er the suggestion that items 1{13 and several of the itemsin 25-35 are the most useful.
1. "." (Dot.)
In the default mode, this matches any character except a newline. If theDOTALL °ag has been speci¯ed, this matches any character including anewline.
2. "^" (Caret.)
Matches the start of the string, and in MULTILINE mode also matchesimmediately after each newline.
3. "$"
Matches the end of the string, and in MULTILINE mode also matchesbefore a newline. foo matches both 'foo' and 'foobar', while the regularexpression foo$ matches only 'foo'.
4. "*"
Causes the resulting RE to match 0 or more repetitions of the precedingRE, as many repetitions as are possible. ab* will match 'a', 'ab', or 'a'followed by any number of 'b's.
5. "+"
Causes the resulting RE to match 1 or more repetitions of the precedingRE. ab+ will match 'a' followed by any non-zero number of 'b's; it willnot match just 'a'.
6. "?"
Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.ab? will match either 'a' or 'ab'.
7. *?, +?, ??
The "*", "+", and "?" quali¯ers are all greedy; they match as muchtext as possible. Sometimes this behaviour isn't desired; if the RE <.*>
is matched against '<H1>title</H1>', it will match the entire string,and not just '<H1>'. Adding "?" after the quali¯er makes it performthe match in non-greedy or minimal fashion; as few characters as possiblewill be matched. Using .*? in the previous expression will match only'<H1>'.
8.3. PYTHON'S RE SYNTAX 115
8. fm,ngCauses the resulting RE to match from m to n repetitions of the precedingRE, attempting to match as many repetitions as possible. For example,af3,5g will match from 3 to 5 "a" characters. Omitting n speci¯es anin¯nite upper bound; you can't omit m.
9. fm,ng?Causes the resulting RE to match from m to n repetitions of the precedingRE, attempting to match as few repetitions as possible. This is the non-greedy version of the previous quali¯er. For example, on the 6-characterstring 'aaaaaa', af3,5g will match 5 "a" characters, while af3,5g? willonly match 3 characters.
10. "\"
Either escapes special characters (permitting you to match characters like"*", "?", and so forth), or signals a special sequence; special sequencesare discussed below.
If you're not using a raw string to express the pattern, remember thatPython also uses the backslash as an escape sequence in string literals;if the escape sequence isn't recognized by Python's parser, the backslashand subsequent character are included in the resulting string. However, ifPython would recognize the resulting sequence, the backslash should berepeated twice. This is complicated and hard to understand, so it's highlyrecommended that you use raw strings for all but the simplest expressions.
11. []
Used to indicate a set of characters. Characters can be listed individually,or a range of characters can be indicated by giving two characters andseparating them by a "-". Special characters are not active inside sets.For example, [akm$] will match any of the characters "a", "k", "m",or "$"; [a-z] will match any lowercase letter, and [a-zA-Z0-9] matchesany letter or digit. Character classes such as \w or \S (de¯ned below) arealso acceptable inside a range. If you want to include a "]" or a "-" insidea set, precede it with a backslash, or place it as the ¯rst character. Thepattern []] will match ']', for example.
You can match the characters not within a range by complementing theset. This is indicated by including a "^" as the ¯rst character of the set;"^" elsewhere will simply match the "^" character. For example, [^5]will match any character except "5".
12. "|"
A|B, where A and B can be arbitrary REs, creates a regular expression thatwill match either A or B. An arbitrary number of REs can be separatedby the "|" in this way. This can be used inside groups (see below) as well.REs separated by "|" are tried from left to right, and the ¯rst one that
116 CHAPTER 8. TEXT AND PATTERN PROCESSING
allows the complete pattern to match is considered the accepted branch.This means that if A matches, B will never be tested, even if it wouldproduce a longer overall match. In other words, the "|" operator is nevergreedy. To match a literal "|", use \|, or enclose it inside a characterclass, as in [|].
13. (...)
Matches whatever regular expression is inside the parentheses, and indi-cates the start and end of a group; the contents of a group can be retrievedafter a match has been performed, and can be matched later in the stringwith the \number special sequence, described below. To match the literals"(" or ")", use \( or \), or enclose them inside a character class: [(]
[)].
14. (?...)
This is an extension notation (a "?" following a "(" is not meaningfulotherwise). The ¯rst character after the "?" determines what the meaningand further syntax of the construct is. Extensions usually do not create anew group; (?P<name>...) is the only exception to this rule. Followingare the currently supported extensions.
15. (?iLmsux)
(One or more letters from the set "i", "L", "m", "s", "u", "x".) Thegroup matches the empty string; the letters set the corresponding °ags(re.I, re.L, re.M, re.S, re.U, re.X) for the entire regular expres-sion. This is useful if you wish to include the °ags as part of the regularexpression, instead of passing a °ag argument to the compile() function.
Note that the (?x) °ag changes how the expression is parsed. It shouldbe used ¯rst in the expression string, or after one or more whitespacecharacters. If there are non-whitespace characters before the °ag, theresults are unde¯ned.
16. (?:...)
A non-grouping version of regular parentheses. Matches whatever regularexpression is inside the parentheses, but the substring matched by thegroup cannot be retrieved after performing a match or referenced later inthe pattern.
17. (?P<name>...)
Similar to regular parentheses, but the substring matched by the groupis accessible via the symbolic group name name. Group names must bevalid Python identi¯ers. A symbolic group is also a numbered group, justas if the group were not named. So the group named 'id' in the exampleabove can also be referenced as the numbered group 1.
For example, if the pattern is (?P<id>[a-zA-Z ]\w*), the group can bereferenced by its name in arguments to methods of match objects, such as
8.3. PYTHON'S RE SYNTAX 117
m.group('id') or m.end('id'), and also by name in pattern text (e.g.(?P=id)) and replacement text (e.g. \g<id>).
18. (?P=name)
Matches whatever text was matched by the earlier group named name.
19. (?#...)
A comment; the contents of the parentheses are simply ignored.
20. (?=...)
Matches if ... matches next, but doesn't consume any of the string.This is called a lookahead assertion. For example, Isaac (?=Asimov)
will match 'Isaac ' only if it's followed by 'Asimov'.
21. (?!...)
Matches if ... doesn't match next. This is a negative lookahead asser-tion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it'snot followed by 'Asimov'.
22. (?<=...)
Matches if the current position in the string is preceded by a match for... that ends at the current position. This is called a positive lookbehindassertion. (?<=abc)def will match "abcdef", since the lookbehind willback up 3 characters and check if the contained pattern matches. Thecontained pattern must only match strings of some ¯xed length, meaningthat abc or a|b are allowed, but a* isn't.
23. (?<!...)
Matches if the current position in the string is not preceded by a matchfor .... This is called a negative lookbehind assertion. Similar to positivelookbehind assertions, the contained pattern must only match strings ofsome ¯xed length.
The special sequences consist of "\" and a character from the list below. Ifthe ordinary character is not on the list, then the resulting RE will match thesecond character. For example, \$ matches the character "$".
24. \number
Matches the contents of the group of the same number. Groups are num-bered starting from 1. For example, (.+) \1 matches 'the the' or '55
55', but not 'the end' (note the space after the group). This specialsequence can only be used to match one of the ¯rst 99 groups. If the ¯rstdigit of number is 0, or number is 3 octal digits long, it will not be inter-preted as a group match, but as the character with octal value number.Inside the "[" and "]" of a character class, all numeric escapes are treatedas characters.
118 CHAPTER 8. TEXT AND PATTERN PROCESSING
25. \A
Matches only at the start of the string.
26. \b
Matches the empty string, but only at the beginning or end of a word. Aword is de¯ned as a sequence of alphanumeric characters, so the end of aword is indicated by whitespace or a non-alphanumeric character. Inside acharacter range, \b represents the backspace character, for compatibilitywith Python's string literals.
27. \B
Matches the empty string, but only when it is not at the beginning or endof a word.
28. \d
Matches any decimal digit; this is equivalent to the set [0-9].
29. \D
Matches any non-digit character; this is equivalent to the set [^0-9].
30. \s
Matches any whitespace character; this is equivalent to the set[\t\n\r\f\v].
31. \S
Matches any non-whitespace character; this is equivalent to the set[^\t\n\r\f\v].
32. \w
When the LOCALE and UNICODE °ags are not speci¯ed, matches anyalphanumeric character; this is equivalent to the set [a-zA-Z0-9 ]. WithLOCALE, it will match the set [0-9 ] plus whatever characters are de-¯ned as letters for the current locale. If UNICODE is set, this will matchthe characters [0-9 ] plus whatever is classi¯ed as alphanumeric in theUnicode character properties database.
33. \W
When the LOCALE and UNICODE °ags are not speci¯ed, matches anynon-alphanumeric character; this is equivalent to the set [^a-zA-Z0-9 ].With LOCALE, it will match any character not in the set [0-9 ], andnot de¯ned as a letter for the current locale. If UNICODE is set, this willmatch anything other than [0-9 ] and characters marked at alphanumericin the Unicode character properties database.
34. \Z
Matches only at the end of the string.
8.4. PROBLEM 2 (CON'T.): PATTERN MATCHES WITH PYTHON 119
35. \\
Matches a literal backslash.
8.4 Problem 2 (con't.): Pattern Matches withPython
Here in Figure 8.2 is a Python program using REs that uses pattern matchingto ¯nd the value of the S&P 500 index.
1.>>> sandpindex = re.search(r'(S&P 500)(.*?)(\d+\.\d+)',data,\
re.IGNORECASE|re.DOTALL)
2.>>> sandpindex.group(0)
3.'S&P 500</p></td>\n<td align="right"><p> 1145.96'
4.>>> sandpindex.group(1)
5.'S&P 500'
6.>>> sandpindex.group(2)
7.'</p></td>\n<td align="right"><p> '
8.>>> sandpindex.group(3)
9.'1145.96'
10.>>> sandpindex.span(3)
11.(735, 742)
12.>>> data[735:742]
13.'1145.96'
>>>
Figure 8.2: RE Program for Finding S&P 500 Index
Points arising:
1. In line 1, in the RE we use parentheses to create three groups. Readingfrom left to right: group 1 is our old friend (S&P 500); group 2,(.*?), is the junk between group 1 and the index value; and group 3,(\d+\.\d+), matches the index value. Lines 4{5 show the match to group1; lines 6{7 show the match to group 2; lines 8{9 show the match to group3; and since group 0 is the entire match, lines 2{3 show it.
2. Groups 2 and 3 use Python's RE metacharacters and syntax to e®ectpattern matching. Group 1, as we saw previously, is an RE that e®ectsonly literal matching.
3. Group 2 = (.*?). (See x8.3 items 1 (page 114), 4 (page 114), and 7 (page114).) The parentheses de¯ne the group. The dot means \any characterat all, except a newline," because because the °ag re.DOTALL is present(line 1), even newlines are matched. So, the dot matches any character atall in this query. The asterisk means \0 or more occurences matching the
120 CHAPTER 8. TEXT AND PATTERN PROCESSING
previous expression," i.e., any number of characters at all. The questionmark means \don't be greedy; stop at the ¯rst pattern satisfying theprevious expression."
4. Group 3 = (\d+\.\d+). (See x8.3 items 28 (page 118), 5 (page 114), and35 (page 119).) \d+ means \one or more decimal digits." \. means \oneoccurrence of the dot character." Note: The backslash is used to escapefrom the normal meaning of the dot character, causing the RE engine totake it literally.
5. The backslash at the end of the top of line 1 is Python's line continuationcharacter. Not strictly necessary, it is useful for display purposes.
6. Note the re.IGNORECASE|re.DOTALL construction in line 1. This sets two°ags for the RE. As mentioned earlier, IGNORECASE says to do matchingregardless of upper or lower case, and DOTALL says that the dot shouldmatch every character, including the newline character, \, which by defaultit fails to match.
8.5 For More Information
See \Regular Expression HOWTO," by A.M. Kuchling. This document may befound at
² http://www.python.org/doc/howto/, and also at
² http://py-howto.sourceforge.net/pdf/regex.pdf
Also, most of the standard Python books have at least brief introductions tothe re module. The standard reference on regular expressions is Je®rey E.F.Friedl, 1977. Mastering Regular Expressions, O'Reilly [8]. Unfortunately, Friedlfocuses on Perl and assumes the old Python RE syntax. But it's a good bookand the principles remain valid.
8.6 Exercises
8.6.1
Find two sources on the Web that report the NASDAQ levels. As of December25, 2001, both
² http://money.cnn.com/markets/nasdaq.html, and
² http://moneycentral.msn.com/investor/research/msnbc/newsnap.asp?sym-bol=$COMPX
8.7. *COMPILING REGULAR EXPRESSIONS 121
do this.Write a Python script, to be run from the command line (in script mode),
that grabs the reported value of the NASDAQ, and that prints out (to thescreen):
1. what the two values are,
2. where they came from,
3. what the di®erence between them is, and
4. the date/time at which these queries were made.
Hints: In addition to using the re module to extract necessary information,you will ¯nd it useful to use the urllib module for downloading Web pages andthe time module for ¯nding the date and time. The following interaction withPython in interactive mode should be helpful to you in this regard.
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> import urllib
>>> cnnnasdaq = "http://money.cnn.com/markets/nasdaq.html"
>>> cnnpage = urllib.urlopen(cnnnasdaq).read()
>>> len(cnnpage)
26697
>>> cnnfile = open('d:\\cnn.txt','w')
>>> cnnfile.write(cnnpage)
>>> cnnfile.close()
>>> import time
>>> now = time.localtime(time.time())
>>> print time.asctime(now)
Tue Dec 25 14:52:30 2001
8.7 *Compiling Regular Expressions
Here is a script, bob.py, that illustrates compiliation of REs.
import re
data = open(r'c:\day\attfragment.html').read()
print len(data)
tomatch = re.compile(r'S&P 500',re.DOTALL|re.IGNORECASE)
mymatch = tomatch.search(data)
print mymatch.group(0)
tomatch = re.compile(r"""(S&P 500)(.*?)(\d+\.\d+)""",
re.S|re.I)
# Note: re.S = re.DOTALL, re.I = re.IGNORECASE
122 CHAPTER 8. TEXT AND PATTERN PROCESSING
if tomatch == None:
print "tomatch is None."
else:
print tomatch
mymatch = tomatch.search(data)
if mymatch == None:
print "It's None"
else:
print mymatch.group(3)
When run, bob.py produces this output (as it should):
>>>
1038
S&P 500
<SRE_Pattern object at 0CAB5590>
1145.96
>>>
Note:
1. If queries are to be done repeatedly on a regular expression, then compilingthen once and running particular queries many times will be more e±cient.
2. Compilation produces an SRE PatternObject from an RE. Roughly:
SRE PatternObject = re.compile(RE)
Then, instead of re.search(RE,TargetString) we useSRE PatternObject.search(TargetString). The e®ects are the same.
3. I had trouble with the compilation °ags. If re.VERBOSE was present thesearch failed to ¯nd anything. Moreover, even with triple quoting, spread-ing out the RE to be compiled across several lines only produced errors.But what's above (") does work.
File: text-pattern.tex
Chapter 9
Programming Excel
Microsoft's COM (Component Object Model) is a standard by which so-calledclient programs can manipulate other programs called server programs.1 Inthe lingo of the trade, a COM server (server program) \exposes its objects" inaccordance with the COM standard so that client programs can get the server todo things for them. Client programs, of course, must know how to communicatewith COM objects.
Excel, Access, Word, PowerPoint and the other parts of the Microsoft O±cesuite all support and conform to COM. That is, each of these programs is capableof being a COM server. Moreover, the VBA built into them is COM-aware sothat VBA programs can be COM clients. In fact, an Excel (Access, Word,. . . )macro is a COM client that uses the Excel (Access, Word,. . . ) host as a COMserver.
From the perspective of a COM client, a COM server program has a certainlook and feel. A server is a hierarchical collection of objects, each object havingits own properties. Each application, of course, has its own characteristic objectsand hierarchy, its own object model. Once you know the object model for aserver, you can write a client-side program to manipulate the server. What thisamounts to is simply changing the properties of the server's objects.
This is in principle a very elegant design. When you work interactively withExcel, manipulating it in the end-user programming style (using the graphi-cal interface), you are essentially just creating and deleting objects (such asworksheets in a workbook), and changing properties of objects. To enter a newnumber in a cell (object) is just to change the cell's Value property. Coloringand other formatting operations should be understood similarly. Here by wayof example is a line of Excel VBA code:
1Di®erent names abound and create confusion. Names have come and gone, includingOLE, Automation, DCOM, COM+, and ActiveX. The programming community has more orless settled on `COM' as the name of this evolving technology standard. The following passagefrom a recent Microsoft publication indicates that Microsoft may have acceded to commonpractice. \The key technology that makes individual O±ce applications programmable andmakes creating an integrated O±ce solution possible is the Component Object Model (COM)technology known as automation" [15, page 37]. We'll stick with `COM'.
123
124 CHAPTER 9. PROGRAMMING EXCEL
Worksheets("Sheet1").Range("A1").Interior.ColorIndex = 3
This is a simple assignment statement. It assigns the value 3 to the stu® on theleft. Interpreted, what's going on is 3 is assigned to the ColorIndex property ofthe Interior (object), which is part of the A1 Range object (the northwest-mostcell), which is part of the Sheet1 Worksheet object. Since 3 is Excel's code forthe color red, execution of this statement causes the A1 cell on Sheet1 to turnred. Similarly,
Worksheets("Sheet1").Range("A1").Value = "Yo, world!"
is VBA code for putting \Yo, world!" in A1, i.e., setting the Value property ofA1 to \Yo, world!".
We're now going to discuss in more detail how to manipulate Excel fromPython, instead of from VBA. As we will see, however, the di®erence is slight.This is a consequence of using COM. Once you've learned it from Python, you'velearned from VBA, and vice versa.
9.1 Excel COM from Python
Our focus will be entirely on using Python for COM client-side programming.That is, our Python scripts will manipulate the exposed objects of a COMserver, here Microsoft Excel. Python can also be used to create a COM server(which could be manipulated by an Excel VBA client, among others!). Theadditional steps to do this are really minimal. We shall forebear in part becausesecurity considerations prohibit this kind of programming on machines in publiclabs. The interested reader might consult [9] or the Website
http://starship.python.net/crew/mhammond/ppw32/and proceed cheerfully at home.
Here's the obligatory \Hello, world" program, with Python the client andExcel the COM server. And to liven things up, we even turn the A1 cell red.(Line numbers have been added.)
Python 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
1.>>> import win32com.client
2.>>> xl = win32com.client.Dispatch('Excel.Application')
3.>>> xl.Visible = 1
4.>>> xl.Workbooks.Add()
5.<COMObject Add>
6.>>> xl.Worksheets("Sheet1").Range("A1").Value = "Yo, world!"
7.>>> xl.Worksheets("Sheet1").Range("A1").Interior.ColorIndex = 3
What's going on here is really quite simple. Line 1 imports the Win32 COMclient module. Making Python COM aware boils down to importing the rightmodule. That's it. Now we can do things. The ¯rst thing we do is to launch|\dispatch" is the technical term|a new Excel process. If we wanted to dispatchWord we would say "Word.Application" instead of "Excel.Application",
9.1. EXCEL COM FROM PYTHON 125
and so on. Note that xl is a Python variable, which will now be an instance ofan Excel COM object. (Try typing type(xl) at the Python prompt.) We couldhave used any other valid Python variable. xl was chosen for purely mnemonicreasons. Think how confusing it would have been to use word.
By default Excel is dispatched in the background, so you can't see it. Inline 3 we set the Visible property of the dispatched Excel application instance,xl, to true, i.e., to 1. After Python executes this line you will see Excel comealive but without any worksheets. We have to add them and we do in line 4.Here, the interpretation is slightly new. Add() is not a property of Workbooks,it is a method (note the parentheses), a program associated with the objectWorkbooks, which now gets executed. The e®ect is that a new workbook, bydefault Book1 with 3 worksheets, is added to the collection of workbooks in xl.If we execute this line again, Book2 with 3 worksheets is added. We'll stick withBook1 for now.
Line 6 is our \Hello, world" program. I wrote it by typing xl. and thencopying
Worksheets("Sheet1").Range("A1").Value = "Yo, world!"
from the VBA program discussed above. I did the same thing in line 7. Thee®ects are the same as in the VBA program, and for the same reasons. Thepattern is evident and straightforward. You program Excel from Python orVBA or any other environment supporting COM clients by identifying the Excelobject you want to a®ect and then either assign a value to one of its propertiesor identify and execute one of its methods. That's essentially it.
The rest is details. The big question now is \How do I learn about the Excelobject model?" We'll cover what you most need here. You can also explorewith the Object Browser in Excel (View then Object Browser from the VBAdevelopment environment in Excel). The best documentation I have found isin [15], but note the blub on the cover: \The hard-core programming guide toMicrosoft O±ce XP development." (You don't need it.)
Perhaps the easiest way to get a handle on the Excel object model is torecord VBA macros in Excel and examine them. Here's one, for copying fromthe range C2:D3 to C6:D7.
Sub Macro4()
'
' Macro4 Macro
' Macro recorded 12/26/2001 by Steven O. Kimbrough
'
Range("B2:C3").Select
Selection.Copy
Range("B5").Select
ActiveSheet.Paste
Application.CutCopyMode = False
Range("A1").Select
End Sub
What it does is plain. On Excel's Activesheet, it copies the range C2:D3 to
126 CHAPTER 9. PROGRAMMING EXCEL
the range whose northwest-most cell is C6. Then it turns o® CutCopyMode. Aliteral transformation to Python won't quite work, but this does (when executedfrom the command line).
import win32com.client
xl = win32com.client.Dispatch('Excel.Application')
print xl.Name
xl.Range("B2:C3").Select()
xl.Selection.Copy()
xl.Range("B5").Select()
xl.ActiveSheet.Paste()
xl.Application.CutCopyMode = 0
xl.Range("A1").Select()
See the pattern? Just a few changes are required. The Python linexl.Application.CutCopyMode = False
won't work as is because Python (rightly) treats False as a variable. If youdeclare False = 0 then this line will work. Also, Select, Copy, and Paste areall methods, rather than properties, and so in Python require their parenthesesto be present.
There are simpler ways to do copying.
>>> xl.Range("C2:D3").Copy(xl.Range("C6"))
1
>>>
See the pattern? The 1 returned by Python indicates that the operation wassuccessful. Here is another way of copying one range to another.
>>> xl.ActiveSheet.Range("C2:D3").Value
((u'a', u'c'), (u'b', u'd'))
>>> xl.ActiveSheet.Range("C5:D6").Value = xl.ActiveSheet.Range("C2:D3").Value
>>> bob = xl.ActiveSheet.Range("C2:D3").Value
>>> bob
((u'a', u'c'), (u'b', u'd'))
>>> xl.ActiveSheet.Range("A10:B11").Value = bob
>>>
Notice that this won't take any formatting along. The tuple (u'a', u'c')
represents the top row of the range C2:D3. C2 = 'a' and D2 = 'b'. u'a' meansthat 'a' is encoded with unicode, which is what Excel does with strings. If youare copying unicode from one part of Excel to another, keeping everything inunicode is ¯ne. If you wish to work with Excel strings in Python, you need toconvert the unicode to ascii. The Python str() function is available for this:
>>> xl.ActiveSheet.Range("C2:D3").Value
((u'a', u'c'), (u'b', u'd'))
>>> bob = xl.ActiveSheet.Range("C2:D3").Value
9.1. EXCEL COM FROM PYTHON 127
>>> bob
((u'a', u'c'), (u'b', u'd'))
>>> carol = str(bob[0][0]),str(bob[0][1])
>>> carol
('a', 'c')
If you have a string in Python and you want to write it out to Excel, Pythonautomatically makes the conversion to unicode.
Most of what you want to do using Python as a client to Excel is one of thefollowing three things:
1. Write information onto a worksheet
2. Read information o® of a worksheet
3. Format a worksheet
Writing and reading to Excel are simple inverses. Here are some writing (toExcel) examples.
>>> xl.Range("A6").Value = "Hello."
>>> xl.Range("A7").Value = 12.3
>>> xl.Range("A8").Value = 12.3
>>> xl.Cells(9,1).Formula = "=sum(a7:a8)"
And here are some related reading (from Excel) examples:
>>> bob = xl.Cells(6,1).Value
>>> bob
u'Hello.'
>>> str(bob)
'Hello.'
>>> carol = xl.Range("A7").Value
>>> carol
12.300000000000001
>>> ted = xl.Cells(8,1).Value
>>> ted
12.300000000000001
>>> alice = xl.Range("A9").Value
>>> alice
24.600000000000001
>>> alicebee = xl.Range("A9").Formula
>>> alicebee
u'=SUM(A7:A8)'
>>> str(alicebee)
'=SUM(A7:A8)'
>>>
And that's about all there really is for reading and writing to Excel cells. Yes,there are tricks, such as
128 CHAPTER 9. PROGRAMMING EXCEL
>>> xl.Range("A1:B3").Value = 12
which assigns 12 to every cell in the range. These are things you learn whenyou need them. Better when starting out to keep to the KISS principle.
And formatting is simply a minor variation. Instead of reading or writingthe Value or Formula property of a cell (or range of cells), we read or writesome other property, such as (see above) the Interior.ColorIndex property. Thedetails, however, are a bit daunting, especially since Excel relies on specialprogram constants, standing for integers whose values are really hard to ¯nd.So, I'm going to show you how, but I'll put the information in an optionalsection.
9.2 *Programming Excel's Formats
Suppose you recorded an Excel VBA macro while you selected the \Borders"toolbar icon and choose the \Bottom Double Border" item. The resulting VBAmacro would look like this:
Sub Macro2()
'
' Macro2 Macro
' Macro recorded 12/28/2001 by Steven O. Kimbrough
'
'
Selection.Borders(xlDiagonalDown).LineStyle = xlNone
Selection.Borders(xlDiagonalUp).LineStyle = xlNone
Selection.Borders(xlEdgeLeft).LineStyle = xlNone
Selection.Borders(xlEdgeTop).LineStyle = xlNone
With Selection.Borders(xlEdgeBottom)
.LineStyle = xlDouble
.Weight = xlThick
.ColorIndex = xlAutomatic
End With
Selection.Borders(xlEdgeRight).LineStyle = xlNone
End Sub
Suppose now you wanted to format cell A9 with a \Bottom Double Border"border. You might think this command will do that:
xl.Range("A9").Borders(xlEdgeBottom).LineStyle = xlDouble
You would be wrong. Python naturally thinks that xlEdgeBottom and xlDouble
are Python variables, and Excel will certainly get the wrong message, if it getsany message at all. You could quote the variables, but then Excel will thinkthey are strings and will get very confused. The problem is that in VBA,xlEdgeBottom is a constant, standing for the integer 9, and xlDouble is a con-stant whose value is the integer -4119. This command does work:
9.2. *PROGRAMMING EXCEL'S FORMATS 129
xl.Range("A9").Borders(9).LineStyle = -4119
This solves our problem, provided we can discover what Excel/VBA's constantsstand for. Here's what you do. Launch PythonWin and enter the followingcommands at the prompt in interactive mode:
>>> import win32com.client
>>> xl = win32com.client.Dispatch('Excel.Application')
>>> xl.Visible = 1
>>> win32com.client.constants.xlNone
-4142
>>>
If this is what you get, then proceed after skipping the next paragraph.If, instead of getting a -4142 in response to your last line you get an error mes-
sage, do this. In PythonWin, under the Tools menu you will see the item COMMakepy utility. Choose it. You will be presented with a very long list of applica-tions. Scroll down until you ¯nd \Microsoft Excel X.Y Object Library (W.Z)"where W, X, Y, and Z are all small integers. Select it, say OK, and wait untilPython builds a ¯le for you. When the prompt returns in the interactive shell(hit RETURN if things quiet down), try win32com.client.constants.xlNone
again. It should work. If so, your Python installation is set up for what we needto do. This only needs doing once per Python installation.
At this point we're essentially done. If X is an Excel VBA constant, thenwin32com.client.constants.X will reveal its value. Here are two examples.
>>> win32com.client.constants.xlEdgeBottom
9
>>> win32com.client.constants.xlDouble
-4119
>>>
So if you can identify the Excel constant, Python will tell you its value, andyou can happily program away. Also, you can use either the integers or theirPython expressions in your Python code. The integer version
xl.Range("A9").Borders(9).LineStyle = -4119
works just as well as the Python object versionxl.Range("A9").Borders(9).LineStyle = win32com.client.constants.xlDouble
My own druthers are to use Python variables as mnemonics.
>>> xlDouble = win32com.client.constants.xlDouble
>>> xlDouble
-4119
>>> xlEdgeBottom = win32com.client.constants.xlEdgeBottom
>>> xlEdgeBottom
9
>>>
And now this does work:
130 CHAPTER 9. PROGRAMMING EXCEL
xl.Range("A9").Borders(xlEdgeBottom).LineStyle = xlDouble
You weren't so wrong after all.
9.3 A Little on the Excel Object Model
Excel and other Microsoft O±ce programs are structured as hierarchical col-lections of objects. The application is an object. The application's workbooksare objects. The worksheets within a workbook are objects, as are the charts.Ranges are objects contained by worksheets, but not by charts. And so on. Ob-jects are particular; they are said to be instances of their classes. For example,Workbooks(3).Worksheets(2).Range("A2:B4") is a particular object. It is aninstance of the Range class.
Objects have properties and methods. Together, they are called members.(Typically, a property is also an object, which may be confusing.) Propertiesare types of values, which usually can be set under program control. We haveseen that Value and Formula are two of the properties of objects in the Range
class. Methods are programs that their objects can execute. Methods may ormay not require parameters on input, but they always require parentheses. Anexample of a parameterless method is Add, which we have seen before:
xl.Workbooks.Add()
So, Add is a method for objects in the Workbooks class. If we look at the ObjectBrowser in Excel (in the VBA editor, ViewjObject Browser) and we focus onthe Excel library, we ¯nd Workbooks among the classes. Selecting Workbooks
we see that Add is the very ¯rst member of the Workbooks class. (Notice howthe icons distinguish the properties and the methods.) If we select Add, right-click on it, and select Help, we see a display that tells us Add is a method foradding objects to collections. It is used with many di®erent types of collections(classes). If we select Workbooks we see this information displayed.
Add Method (Workbooks Collection)
Creates a new workbook. The new workbook becomes the activeworkbook. Returns a Workbook object.
Syntax
expression.Add(Template)
expression Required. An expression that returns a Workbooks ob-ject.
Template Optional Variant. Determines how the new workbook iscreated. If this argument is a string specifying the name of an ex-isting Microsoft Excel ¯le, the new workbook is created with thespeci¯ed ¯le as a template. If this argument is a constant, the newworkbook contains a single sheet of the speci¯ed type. Can be one of
9.3. A LITTLE ON THE EXCEL OBJECT MODEL 131
the following XlWBATemplate constants: xlWBATChart, xlWBA-TExcel4IntlMacroSheet, xlWBATExcel4MacroSheet, or xlWBAT-Worksheet. If this argument is omitted, Microsoft Excel creates anew workbook with a number of blank sheets (the number of sheetsis set by the SheetsInNewWorkbook property).
Remarks
If the Template argument speci¯es a ¯le, the ¯le name can includea path.
Objects have properites and methods. You call the methods and getor set the properties. See the Excel object browser and then use thehelp facility. Also, record VBA macros.
Exploring in this manner will tell you much about Excel's object model. TheObject Browser is there when you need it, so you shouldn't bother with mem-orization. Recording VBA macros and studying them (when you need to knowsomething) is also a good way to learn about the object model.
Finally, Python is very helpful for learning about the object model. Recall:
>>> xl.Name
u'Microsoft Excel'
Microsoft Excel is our top-level object. We can also get the name and the parentof any (particular) object:
>>> xl.Workbooks(1).Name
u'Book1'
>>> xl.Workbooks(1).Parent.Name
u'Microsoft Excel'
Notice that the top-level object is its own parent:
>>> xl == xl.Parent
1
>>> xl.Parent.Name
u'Microsoft Excel'
>>> xl.Workbooks(1) == xl.Workbooks(1).Parent
0
And the pattern generalizes:
>>> xl.ActiveSheet.Name
u'Sheet2'
>>> xl.ActiveSheet.Parent.Name
u'Book1'
Notice that the Name property is settable:
>>> xl.ActiveSheet.Name = 'Ted'
>>> xl.ActiveSheet.Name
u'Ted'
132 CHAPTER 9. PROGRAMMING EXCEL
9.4 Miscellany
9.4.1 Gotchyas
Capitalization:
>>> xl.Name
u'Microsoft Excel'
>>> xl.name
Traceback (most recent call last):
File "<pyshell#15>", line 1, in ?
xl.name
File "C:\Python21\win32com\client\__init__.py", line 348, in __getattr__
raise AttributeError, attr
AttributeError: name
>>>
Lesson: Get it right.
9.4.2 Range Names
>>> xl.Workbooks(1).Names(1).Name
u'ted'
>>> xl.Workbooks(1).Names(1).RefersTo
u'=Sheet1!$C$2:$D$3'
>>> myted = xl.Workbooks(1).Names(1)
>>> myted.Name
u'ted'
>>> xl.Workbooks(1).Names.Add('alice','=Sheet1!R1C1')
<win32com.gen_py.Microsoft Excel 9.0 Object Library.Name>
>>> xl.Workbooks(1).Names.Count
2
>>> xl.Workbooks(1).Names(2).Name
u'ted'
>>> xl.Workbooks(1).Names(1).Name
u'alice'
>>> xl.Workbooks(1).Worksheets(1).Range("alice").Value = 19
>>> xl.Range("alice").Value = 22
The last line works only if workbook 1 and worksheet 1 are active; otherwiseExcel would be confused (and understandably so). Assuming it is, this lineturns an entire range red:
>>> xl.Range("ted").Interior.ColorIndex = 3
9.4.3 Saving Workbooks
>>> xl.ActiveWorkbook.SaveAs("C:\day\pythonegs3.xls")
9.4. MISCELLANY 133
>>> xl.ActiveWorkbook.Name
u'pythonegs3.xls'
>>> xl.ActiveWorkbook.Save()
>>>
9.4.4 Directories
Python's os (operating system) has a number of useful methods for handlingdirectories and ¯les on the local system. Perhaps most useful are those forreturning a list of the contents of a directory and for creating new directories.Notice that this code segment was run on Windows NT.
>>> import os
>>> os.listdir("C:\\day")
['attfragment.html', 'atthome.html', 'awrapper.aux',
>>> os.name
'nt'
>>> os.mkdir("c:\\day\\mydir")
Also very useful is os.system(...), which you can use to run other programs,including batch ¯les (in Windows).
>>> os.system("c:\\test")
Runs the batch ¯le test.bat. Try it with this as the contents:dir *.*
time
(But try test.bat ¯rst from the command prompt.)
9.4.5 Grabbing Command Line Arguments
When you run a Python program in script mode it is often useful to be ableto specify arguments on the command line, which the script will process as itexecutes. Here's an example. The Python ¯le, arglist.py, looks like this:
import sys
argumentlist = sys.argv
print argumentlist
Here is what it does when executed from the command line, with and withoutarguments.
C:\>arglist.py
['C:\\arglist.py']
C:\>arglist.py Now is the time for 12.3
['C:\\arglist.py', 'Now', 'is', 'the', 'time', 'for', '12.3']
C:\>
134 CHAPTER 9. PROGRAMMING EXCEL
In a real program, you would process the list argumentlist, converting thestrings, where necessary, to numbers. Example:
>>> int('12')
12
>>> float('12.3')
12.300000000000001
>>>
9.4.6 Grabbing User Input
raw input prompts the user and awaits a response. The response is read into astring for subsequent processing by the program. Here's an example. In an IDE,when input = raw input("Tell me: ") is executed a dialog box appears andthe program stops until the user gives a response. That response is read intoinput and the program continues.
>>> input = raw_input("Tell me: ")
>>> input
'Go Eagles!'
>>> type(input)
<type 'string'>
>>>
9.4.7 Copying Worksheets
This code puts a copy of worksheet 1 before worksheet 3.
xl.Worksheets(1).Copy(xl.Worksheets(3))
9.5 Gotchyas
9.5.1 Forgetting Parentheses in Methods
Some things don't work the way you might think they should. Here's an exam-ple.
PythonWin 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)]
on win32.
Portions Copyright 1994-2001 Mark Hammond ([email protected])
- see 'Help/About PythonWin' for further copyright information.
>>> import win32com.client
>>> xl = win32com.client.Dispatch('Excel.Application')
>>> xl.Visible
0
>>> xl.Visible = 1
>>> xl.Workbooks.Add()
9.5. GOTCHYAS 135
<win32com.gen_py.Microsoft Excel 9.0 Object Library.Workbook>
>>> xl.Workbooks.Add()
<win32com.gen_py.Microsoft Excel 9.0 Object Library.Workbook>
>>> xl.ActiveWorkbook.Name
u'Book2'
>>> xl.Workbooks(1).Activate
<method _Workbook.Activate of _Workbook instance at 010A8AA4>
>>> xl.ActiveWorkbook.Name
u'Book2'
The command to activate workbook 1 is accepted without error, but provesine®ective. The problem is that Activate is a method and needs parentheses.The line
xl.Workbooks(1).Activate()
would work.
9.5.2 Capitalization
Excel's COM objects and members (properties and attributes) have an o±cialcapitalization scheme|and this matters. Usually, that is. Sometimes it seemsthat you can get away with incorrect capitalization. Mostly you can't.
>>> xl.Name
u'Microsoft Excel'
>>> xl.Selection.Address
u'$A$1'
>>> xl.ActiveSheet.Range("B5:C8").Select()
>>> xl.Selection.Address
u'$B$5:$C$8'
>>> xl.Selection.address
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "D:\Python21\win32com\client\__init__.py", line 348, in __getattr__
raise AttributeError, attr
AttributeError: address
>>> xl.name
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "D:\Python21\win32com\client\__init__.py", line 348, in __getattr__
raise AttributeError, attr
AttributeError: name
>>> xl.Name
u'Microsoft Excel'
>>> xl.Activesheet.name
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "D:\Python21\win32com\client\__init__.py", line 348, in __getattr__
136 CHAPTER 9. PROGRAMMING EXCEL
raise AttributeError, attr
AttributeError: Activesheet
>>> xl.Activesheet.Name
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "D:\Python21\win32com\client\__init__.py", line 348, in __getattr__
raise AttributeError, attr
AttributeError: Activesheet
>>> xl.ActiveSheet.Name
u'Sheet1'
>>>
9.6 For More Information
On Python and COM:
² QuickStartClientCom.html, which comes with the win32com installation.
² http://www.python.org/windows/, the o±cial source.
² http://starship.python.net/crew/mhammond/conferences/
² Python Programming on Win32 by Mark Hammond and Andy Robinson,O'Reilly, 2000.
See in support his site: http://starship.python.net/crew/mhammond/ppw32/.
² http://aspn.activestate.com/ASPN/Python/Reference/Products/-ActivePython/win32com/win32com.html
http://aspn.activestate.com//ASPN/Python/Reference/Products/-ActivePython/win32com/win32com/win32com/test/
http://aspn.activestate.com//ASPN/Python/Reference/Products/-ActivePython/win32com/win32com/win32com/html/docindex.html
Learning about COM is a bit tough. Here's something helpful. In Excel, inVBA, under View, select ObjectBrowser. Browse and right click on what you'reinterested in, then ask for help.
9.7 Exercises
9.7.1
The SEC (Securities and Exchange Commission) requires various reports to be¯led by American corporations. Stock market analysts are particularly keen infollowing the 10-Q reports, which are ¯led quarterly and contain, among otherthings, ¯nancial statements by the companies. The SEC ¯ling, including the 10-Q reports, are available online at http://www.sec.gov. Write a Python program
9.7. EXERCISES 137
that grabs data from 10-Q reports for 4-6 companies and loads these data intoa well-designed spreadsheet format for analysis and comparison purposes.
File: python-excel.tex
138 CHAPTER 9. PROGRAMMING EXCEL
Chapter 10
Python and Database viaDAO
10.1 Preliminaries
See in the Excel VBA editor ToolsjRefrences. Scroll down until you ¯nd some-thing like \Microsoft DAO 3.6 Object Library." This tells you that you needto dispatch "DAO.DBEngine.36". You can also see the References from Access.Note the directions from Microsoft's online help:
On the Tools menu, click References. The References commandon the Tools menu is available only when a Module window is openand active in Design view.
Also, you should ¯rst run the makepy utility. In Pythonwin under the Toolsmenu choose \COM Makepy utility" then select \Microsoft DAO 3.6 ObjectLibrary". You only need to do this once.
10.2 Getting Connected and Getting Data
The basic structure for connecting to an Access database is like that for con-necting to Excel. The following code imports the win32com.client module,dispatches the appropriate DAO.DBEngine (version 3.6), and opens an existingdatabase, called db1.mdb.
>>> import win32com.client
>>> engine = win32com.client.Dispatch("DAO.DBEngine.36")
>>> db = engine.OpenDatabase(r'c:\day\db1.mdb')
As usual, the database is an object with properties.
>>> db.Name
u'c:\\day\\db1.mdb'
139
140 CHAPTER 10. PYTHON AND DATABASE VIA DAO
Usually, we will be wanting to access a database in order to run SQL SELECTqueries. To do so, we run a query and get returned a recordset, conceptuallythe result of the query put into a table in memory.
>>> rs = db.OpenRecordset('select count(*) from Table1')
The argument for OpenRecordset can be any valid SQL query string. Successfulcompletion of the query produces a recordset object, here rs. Here we obtaininformation about it.
>>> rs.Parent.Name
u'c:\\day\\db1.mdb'
>>> for i in range(rs.Fields.Count):
... print rs.Fields(i).Name
...
Expr1000
>>> type(rs.Fields(0).Name)
<type 'unicode'>
>>> field = rs.Fields(0).Name
>>> field
u'Expr1000'
>>> dacount = rs.Fields(0).value
>>> print dacount
2
>>> type(dacount)
<type 'int'>
>>> rs.Fields.Count
1
Notice the pattern: a recordset is an object with properties; we access thoseproperties in the usual way, under program control. We do more of this:
>>> rs = db.OpenRecordset('select * from Table1')
>>> bob = rs.Fields('wordid').value
>>> bob
3
>>> rs.Fields.Count
3
>>> rs.MoveLast
<method CDispatch.MoveLast of CDispatch instance at 0192CB44>
>>> for i in range(rs.Fields.Count):
... print rs.Fields(i).Value
...
1
1
3
>>> rs.Parent.Name
u'c:\\day\\db1.mdb'
10.2. GETTING CONNECTED AND GETTING DATA 141
>>> rs.Fields(0).Name
u'docid'
>>> for i in range(rs.Fields.Count):
... print rs.Fields(i).Name
...
docid
position
wordid
>>>
>>> db.Close()
MoveLast is a recordset method. Given a recordset there is always a cursoror record pointer pointing a some record in the recordset. MoveFirst movesthe cursor to the ¯rst record in the recordset. MoveLast moves it to the lastrecord. MoveNext and MovePrevious do the obvious things. Fields is a prop-erty of a recordset, and Count is a property of Fields. In the above interaction,we see that Table1 has three ¯elds (columns), called \docid," \position," and\wordid."
Now we do another SELECT query.
>>> rs = db.OpenRecordset('select * from Table1')
>>> while not rs.EOF:
print rs.Fields('wordid').value
rs.MoveNext()
3
4
What this does is tell us that \wordid" is 3 and 4 in the two records of thisrecordset. EOF means \end of ¯le." Basically, while not rs.EOF: prevents theprogram from trying to access a nonexistent record in the recordset.
Here we explore the database further, looking at various properties.
>>> db = engine.OpenDatabase(r'c:\day\db1.mdb')
>>> for i in range(db.TableDefs.Count):
... print i, db.TableDefs(i).Name
...
0 MSysAccessObjects
1 MSysACEs
2 MSysObjects
3 MSysQueries
4 MSysRelationships
5 Table1
>>> for i in range(db.TableDefs(5).Fields.Count):
... print i, db.TableDefs(5).Fields(i).Name
...
142 CHAPTER 10. PYTHON AND DATABASE VIA DAO
0 docid
1 position
2 wordid
>>> rs = db.OpenRecordset('select * from Table1')
>>> rs.MoveLast()
>>> rs.RecordCount
2
>>> rs.MoveFirst()
>>> rs.GetRows()
((1,), (1,), (3,))
>>> rs.GetRows()
((1,), (2,), (4,))
>>> rs.GetRows()
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
[further error message]
>>> rs.Close()
Notive (above) that there are actually 6 tables in this database. Five of themare Access housekeeping tables (you can access them). Only Table1 is \forreal." Note that for i in range(db.TableDefs(5).Fields.Count): doeshere what while not rs.EOF: did above: prevent an error due to exceedingthe recordset. RecordCount is new here. Very useful, but be sure you ¯rstexecute
>>> rs.MoveLast()
Finally, GetRows() is new. GetRows() retrieves one row|the one being pointedto by the cursor|as a tuple of tuples. (The notation \(1,)" means a tuple withone element, the integer 1.) GetRows(n) retrieves n rows as tuples of tuples,starting with the current row. Notice that if you try to get a row beyond therecordset you get an error.
10.3 Beyond SELECT
To run SQL commands other than SELECT commands you use a di®erentmechanism in DAO. You use Execute. Here's an example.
>>> db.Execute("delete * from Table1")
>>> rs = db.OpenRecordset("select * from Table1")
>>> rs.MoveLast()
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
[further error message]
>>> rs.RecordCount
0
>>> db.Execute("INSERT INTO Table1 VALUES(10, 11, 12)")
10.4. HANDLING QUOTES IN SQL SELECT QUERIES 143
>>> rs = db.OpenRecordset("select * from Table1")
>>> rs.MoveLast()
>>> rs.RecordCount
1
>>> rs.MoveFirst()
>>> rs.GetRows()
((10,), (11,), (12,))
>>>
Here we see a DELETE and an INSERT command from SQL. It all works ¯ne.Word of Warning: Be sure to test your SQL statements directly in Accessbefore puzzling about why they don't work in Python.
10.4 Handling Quotes in SQL SELECT Queries
These interactions demonstrate use of strings to de¯ne the SQL queries, andthe use of SQL WHERE clauses.
>>> rs.Fields.Item(0).Name
u'docid'
>>> db.Name
u'c:\\day\\db1.mdb'
>>> rs = db.OpenRecordset("select * from Table1")
>>> rs.MoveLast()
>>> rs.RecordCount
1
>>> rs.GetRows()
((10,), (11,), (12,))
>>> db.Execute("INSERT INTO Table1 VALUES(20, 21, 22)")
>>> rs = db.OpenRecordset("select * from Table1")
>>> rs.MoveLast()
>>> rs.RecordCount
2
>>> rs.GetRows()
((20,), (21,), (22,))
>>> strSQL = "SELECT * FROM Table1 WHERE docid < 15"
>>> rs =db.OpenRecordset(strSQL)
>>> rs.MoveLast()
>>> rs.RecordCount
1
>>> rs.GetRows()
((10,), (11,), (12,))
>>>
Finally, this interaction with the spj-begin.mdb database illustrates con-struction of an SQL query string containing an element that has to be quoted.
144 CHAPTER 10. PYTHON AND DATABASE VIA DAO
Also, Access allows odd characters in its ¯eld names. \S#" is not permitted instandard SQL. So, Access uses a square bracket notation to indicate that thisis indeed a valid ¯eld name (in Access). This mechanism is also needed when¯eld names have spaces in them.
>>> db.TableDefs(5).Fields(0).Type
4
>>> spdb = engine.OpenDatabase(r'c:\day\spj-begin.mdb')
>>> spdb.Name
u'c:\\day\\spj-begin.mdb'
>>> strSQLsp = "SELECT [S#] From s WHERE SNAME = 'Adams'"
>>> rssp = spdb.OpenRecordset(strSQLsp)
>>> rssp.MoveLast()
>>> rssp.RecordCount
1
>>> rssp.GetRows()
((u'S5',),)
>>>
10.5 Gotchyas
If you are accessing a database from Python and then you open it up MicrosoftAccess, suddenly your Python DAO commands won't work. The problem isthat Access is a single user system.
Referential integrity considerations in Access may prevent deletion of arecord. If so, your DAO SQL DELETE command won't work; it'll do nothing.Lesson: try things ¯rst in Access in SQL.
10.6 For More Information
Microsoft's [15], especially page 730, has useful information on DAO. The help¯le is DAO360.CHM and is a useful install. Helen Feddema's DAO Object Model:The De¯nitive Reference [7] is excellent. Two terri¯c Web pages for Python andDAO and Access:
² http://starship.python.net/crew/bwilk/access.html
² http://www.e-coli.net/pyado.html
File: python-dao.tex
Chapter 11
Python Quick Reference
11.1 Setting a Convenient Path
To use an example without loss of generality, suppose you have Python module¯le, startcom.py, at D:\athomepc\day, as follows:
import win32com.client
xl = win32com.client.Dispatch('Excel.Application')
If you try to import this module, you'll likely get an error message.
Python 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> from startcom import *
Traceback (most recent call last):
File "<pyshell#0>", line 1, in ?
from startcom import *
ImportError: No module named startcom
>>>
The problem is that D:\athomepc\day is not in Python's search path. Thefollowing code shows this.
>>> import sys
>>> sys.path
['D:\\PYTHON21\\Tools\\idle', 'D:\\Python21\\win32',
'D:\\Python21\\win32\\lib','D:\\Python21',
'D:\\Python21\\Pythonwin', 'D:\\PYTHON21\\DLLs',
'D:\\PYTHON21\\lib',
'D:\\PYTHON21\\lib\\plat-win',
'D:\\PYTHON21\\lib\\lib-tk']
>>>
145
146 CHAPTER 11. PYTHON QUICK REFERENCE
The thing to do is to add (append) the path you want Python to look in tothe sys.path list. You can do it as follows.
>>> sys.path.append('d:\\athomepc\\day')
>>> sys.path
['D:\\PYTHON21\\Tools\\idle', 'D:\\Python21\\win32',
'D:\\Python21\\win32\\lib', 'D:\\Python21',
'D:\\Python21\\Pythonwin', 'D:\\PYTHON21\\DLLs',
'D:\\PYTHON21\\lib', 'D:\\PYTHON21\\lib\\plat-win',
'D:\\PYTHON21\\lib\\lib-tk', 'd:\\athomepc\\day']
>>>
Now your import will work just ¯ne.
>>> from startcom import *
>>> xl.Name
u'Microsoft Excel'
>>>
11.2 Dispatching Excel
PythonWin 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)]
on win32.
Portions Copyright 1994-2001 Mark Hammond ([email protected])
- see 'Help/About PythonWin' for further copyright information.
>>> import win32com.client
>>> xl = win32com.client.Dispatch('Excel.Application')
>>> xl.Visible = 1
>>> xl.Workbooks.Add()
<win32com.gen_py.Microsoft Excel 9.0 Object Library.Workbook>
11.3 Using Excel Constants from Python
>>> xlDouble = win32com.client.constants.xlDouble
>>> xlDouble
-4119
>>> xlEdgeBottom = win32com.client.constants.xlEdgeBottom
>>> xlEdgeBottom
9
>>>
Then you can use these Python variables as if they were Excel VBA constants.
xl.Range("A9").Borders(xlEdgeBottom).LineStyle = xlDouble
11.4. USING FORMULAR1C1-STYLE FORMATS 147
>>> db.Properties(0).Name
u'Name'
>>> db.Properties.Count
13
>>> for i in range(db.Properties.Count):
... print i, db.Properties(i).Name
...
0 Name
1 Connect
2 Transactions
3 Updatable
4 CollatingOrder
5 QueryTimeout
6 Version
7 RecordsAffected
8 ReplicaID
9 DesignMasterID
10 Connection
11 AccessVersion
12 Build
>>> db.Properties(11).Name
u'AccessVersion'
>>> db.Properties(11).Value
u'08.50'
>>>
11.4 Using FormulaR1C1-Style Formats
>>> xl.ActiveWorkbook.Worksheets('Sheet1').cells(4,2).FormulaR1C1 = \
"=SUM(R[-2]C:R[-1]C)"
Note: \ is Python's line-continuation character.
11.5 Dispatching DAO for MS Access
>>> import win32com.client
>>> engine = win32com.client.Dispatch("DAO.DBEngine.36")
>>> db = engine.OpenDatabase(r'c:\\day\\db1.mdb')
>>> db.Name
u'c:\\day\\db1.mdb'
>>> rs = db.OpenRecordset('select count(*) from Table1')
>>> rs.Parent.Name
u'c:\\day\\db1.mdb'
>>> for i in range(rs.Fields.Count):
... print rs.Fields(i).Name
148 CHAPTER 11. PYTHON QUICK REFERENCE
...
Expr1000
File: python-quick-ref.tex