Tekoa: A Domain-Specific Language for Defining Opus Variables

11
Tekoa: A Domain-Specific Language for Defining Opus Variables The variable concept in Opus Problems with defining Opus variables in Python Tekoa examples Syntax Status and Plans for Further Work User discussion & wish list

description

Tekoa: A Domain-Specific Language for Defining Opus Variables. The variable concept in Opus Problems with defining Opus variables in Python Tekoa examples Syntax Status and Plans for Further Work User discussion & wish list. The Variable Concept in Opus. - PowerPoint PPT Presentation

Transcript of Tekoa: A Domain-Specific Language for Defining Opus Variables

Page 1: Tekoa: A Domain-Specific Language for Defining Opus Variables

Tekoa: A Domain-Specific Language for Defining Opus Variables

• The variable concept in Opus• Problems with defining Opus variables in Python

• Tekoa examples• Syntax• Status and Plans for Further Work• User discussion & wish list

Page 2: Tekoa: A Domain-Specific Language for Defining Opus Variables

The Variable Concept in Opus• A model variable (or just variable) is an attribute of actors or geographies used in a model.

• Variables are properties of datasets, e.g. a gridcell dataset or a parcel dataset

• Examples:– Population density– Land cost– Travel time to city center

• Two kinds:– Primary attribute– Derived attribute

• Not the same as “variable” as used in programming languages

Page 3: Tekoa: A Domain-Specific Language for Defining Opus Variables

Implementing Variables• Opus implements a model variable as a subclass of the Python class Variable

• Uses lazy evaluation• Methods

– dependencies()– compute()

• This has worked very well from the point of view of accessing and computing variables

• However, defining a new variable (even a simple one) requires writing a new Python class, ideally including a unit test

Page 4: Tekoa: A Domain-Specific Language for Defining Opus Variables

Variables in Python vs. Tekoa% definition of zone.average_income in Pythonfrom opus_core.variables.variable import Variableclass average_income(Variable): def dependencies(self): return ["household.income", "zone.zone_id”, "urbansim_parcel.household.zone_id”] def compute(self, dataset_pool): households = dataset_pool.get_dataset("household”) return self.get_dataset().aggregate_dataset_over_ids( households, "mean", "income")

% *** code for unit tests omitted *** ______________________________________________% Tekoa definition

average_income = zone.aggregate(household.income, function=mean)

Page 5: Tekoa: A Domain-Specific Language for Defining Opus Variables

Tekoa - Aggregation through multiple geographies% employment in the ‘large_area’ geographyemployment=large_area.aggregate

(urbansim_parcel.building.number_of_jobs, intermediates=[parcel, zone, faz])

Explanation:• number_of_jobs is an attribute of building. We

then aggregate this up to the parcel level, then the zone level, then the faz level, and finally the large_area level, to find the employment in the large_area.

• The ‘employment=’ part gives an alias for the expression, so that it displays nicely in the resulting indicator.

Page 6: Tekoa: A Domain-Specific Language for Defining Opus Variables

Tekoa - More Complex Example

% definition of parcel.is_pre_1940% is the average building age for a parcel % older than 1940?is_pre_1940 = parcel.aggregate

(building.year_built *numpy.ma.masked_where(urbansim_parcel.building.has_valid_year_built==0, 1),

function=mean) < 1940

Page 7: Tekoa: A Domain-Specific Language for Defining Opus Variables

Syntax• Syntax is a subset of Python• An expression can be:

– The name of a variable– A function or operator applied to other expressions

• All of the numpy functions and operators are available, e.g. exp, sqrt, +, -, ==, <

• numpy-style array and matrix operations — for example, 1.2*household.incomescales all the elements of the array of incomes

• Aggregation– Intermediates argument -- list of intermediate datasets

– Function - can be sum, mean, median, min, max

• Disaggregation also supported

Page 8: Tekoa: A Domain-Specific Language for Defining Opus Variables

Interaction Sets and Expressions

• InteractionDataset is a subclass of Dataset, which stores its data as a 2-d array

• For example, for household location choice we are interested in the interaction between household income and cost per residential unit

• The expression ln(household.income) * zone.average_housing_cost)returns an nm array where n is the number of households and m is the number of zones

Page 9: Tekoa: A Domain-Specific Language for Defining Opus Variables

Implementation• When a new Tekoa expression is encountered, the

system:– parses it (using the Python parser)– analyzes the expression for dependencies on other

variables and special methods (e.g. aggregate, disaggregate)

– compiles a new Python class that defines the variable, including a dependencies() and a compute() method

– Recursively compiles a new variable when aggregating/disaggretating an expression

• Consequence: efficiency of expressions is the same as for the old-style definitions

• The system maintains a cache of expressions that have already been compiled, so that if the same expression is encountered again the previously-compiled class is just returned

Page 10: Tekoa: A Domain-Specific Language for Defining Opus Variables

More Examples and Documentation• For lots of examples, see the aliases.py for various datasets in the urbansim_parcel package, e.g. – urbansim_parcel/buildings/aliases.py– urbansim_parcel/job/aliases.py– …

• The language is described in Section 6.4 of the Opus/Urbansim User Manual

• Also see: Alan Borning, Hana Sevcikova, and Paul Waddell, “A Domain-Specific Language for Urban Simulation Variables”, to appear, International Conference on Digital Government Research, Montreal, Canada, May 2008.

Page 11: Tekoa: A Domain-Specific Language for Defining Opus Variables

Tekoa Status and Future Work• Benefits:

– significantly reduced code size (factor of 7 for urbansim gridcell vs urbansim parcel)

– increased modeler productivity

• Additional features to implement:– Parameterized expressions. For example is_pre_1940 should really be is_pre(1940)

– Better error detection and messages– Tutorial & advanced techniques

• Replace old variable definitions in the code base for gridcell model system with expressions (big job)

• Integration of expressions with GUI

• User discussion & wish list?