Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in...

9
Addressing the Fundamental Tension of PCGML with Discriminative Learning Isaac Karth Dept. of Computational Media University of California, Santa Cruz 1156 High Street Santa Cruz, CA 95064 [email protected] Adam M. Smith Dept. of Computational Media University of California, Santa Cruz 1156 High Street Santa Cruz, CA 95064 [email protected] Abstract—Procedural content generation via machine learning (PCGML) is typically framed as the task of fitting a generative model to full-scale examples of a desired content distribution. This approach presents a fundamental tension: the more de- sign effort expended to produce detailed training examples for shaping a generator, the lower the return on investment from applying PCGML in the first place. In response, we propose the use of discriminative models (which capture the validity of a design rather the distribution of the content) trained on positive and negative examples. Through a modest modification of WaveFunctionCollapse, a commercially-adopted PCG approach that we characterize as using elementary machine learning, we demonstrate a new mode of control for learning-based generators. We demonstrate how an artist might craft a focused set of additional positive and negative examples by critique of the generator’s previous outputs. This interaction mode bridges PCGML with mixed-initiative design assistance tools by working with a machine to define a space of valid designs rather than just one new design. I. I NTRODUCTION Procedural Content Generation via Machine Learning (PCGML) is the recent term for the strategy of controlling content generators using examples [1]. Existing PCGML approaches train their statistical models based on pre-existing artist-provided samples of the desired content. However, there is a fundamental tension here: machine learning often works better with more training data, but the effort to produce quality training data is frequently costly enough that the artists might be better off just making the content themselves. Rather than attempting to train a generative statistical model (capturing the distribution of desired content), we focus on applying discriminative learning. In discriminative learning, the model learns to judge whether a candidate content artifact would be valid or desirable, but it does not learn how to generate candidates. Pairing a discriminative model with a pre-existing content generator, we realize example-driven generation that can be influenced by both positive and negative examples of valid design patterns. We examine this idea inside of an already-commercially-adopted example-based generation system, WaveFunctionCollapse (WFC) [2]. 1 This approach 1 https://github.com/mxgmn/WaveFunctionCollapse begins to address the fundamental tension in PCGML while also opening connections to mixed-initiative design tools. Mixed-initiative content generation tools [3] are designed around the idea of the artist having a conversation with the tool about one specific design across many alterations, with the goal of creating one high-quality design. We propose to adapt this conversational teaching model for application in PCGML systems. While still having conversations with the tool about specific designs, the conversations are leveraged to talk about the general shape of the design space. The artist trains the generator to the point where it will be trusted to follow that style in the future, when the generator can be run non-interactively. Instead of an individual artifact, the goal is define a space of desirable artifacts from which the generator may sample. It might seem most obvious to setup PCGML by using a generative statistical model. Such models explicitly capture the desired content distribution: p(C). In the the proposed discriminative learning strategy, we intend only to learn whether a candidate design is considered valid or not: p(V |C). Such models can often be fit from much less data than generative models, and they do not require the distribution of content C in the samples to be representative of the target content distribution (easing the fundamental tension of PCGML). Indeed, many negative examples which demonstrate invalid designs will contain design details which should have zero likelihood. We assume the existence of a generator that can draw samples from a prior p(C) filtered by the condition p(V = true|C), effectively applying Bayes rule to draw samples from the conditional distribution: P (C|V = true). WFC is one such generator that can be directly constrained to yield only designs that would be classified as valid. This paper illuminates the implicit use of machine learning in WFC, explains how discriminative learning may be integrated, and presents a detailed worked example of the conversational teaching model. We refer to the primary user as an artist to emphasize the primarily visual interface. It should be understood that the process of creating the input image can involve both design skills and programming reasoning: the artist is specifying both an aesthetic goal and a complex system of constraints to achieve that goal. This is well within the scope arXiv:1809.04432v1 [cs.LG] 10 Sep 2018

Transcript of Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in...

Page 1: Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in PCGML while also opening connections to mixed-initiative design tools. Mixed-initiative

Addressing the Fundamental Tension of PCGMLwith Discriminative Learning

Isaac KarthDept. of Computational Media

University of California, Santa Cruz1156 High Street

Santa Cruz, CA [email protected]

Adam M. SmithDept. of Computational Media

University of California, Santa Cruz1156 High Street

Santa Cruz, CA [email protected]

Abstract—Procedural content generation via machine learning(PCGML) is typically framed as the task of fitting a generativemodel to full-scale examples of a desired content distribution.This approach presents a fundamental tension: the more de-sign effort expended to produce detailed training examples forshaping a generator, the lower the return on investment fromapplying PCGML in the first place. In response, we proposethe use of discriminative models (which capture the validityof a design rather the distribution of the content) trained onpositive and negative examples. Through a modest modification ofWaveFunctionCollapse, a commercially-adopted PCG approachthat we characterize as using elementary machine learning, wedemonstrate a new mode of control for learning-based generators.We demonstrate how an artist might craft a focused set ofadditional positive and negative examples by critique of thegenerator’s previous outputs. This interaction mode bridgesPCGML with mixed-initiative design assistance tools by workingwith a machine to define a space of valid designs rather thanjust one new design.

I. INTRODUCTION

Procedural Content Generation via Machine Learning(PCGML) is the recent term for the strategy of controllingcontent generators using examples [1]. Existing PCGMLapproaches train their statistical models based on pre-existingartist-provided samples of the desired content. However, thereis a fundamental tension here: machine learning often worksbetter with more training data, but the effort to produce qualitytraining data is frequently costly enough that the artists mightbe better off just making the content themselves.

Rather than attempting to train a generative statistical model(capturing the distribution of desired content), we focus onapplying discriminative learning. In discriminative learning,the model learns to judge whether a candidate content artifactwould be valid or desirable, but it does not learn how to generatecandidates. Pairing a discriminative model with a pre-existingcontent generator, we realize example-driven generation thatcan be influenced by both positive and negative examplesof valid design patterns. We examine this idea inside ofan already-commercially-adopted example-based generationsystem, WaveFunctionCollapse (WFC) [2].1 This approach

1https://github.com/mxgmn/WaveFunctionCollapse

begins to address the fundamental tension in PCGML whilealso opening connections to mixed-initiative design tools.

Mixed-initiative content generation tools [3] are designedaround the idea of the artist having a conversation with thetool about one specific design across many alterations, withthe goal of creating one high-quality design. We propose toadapt this conversational teaching model for application inPCGML systems. While still having conversations with thetool about specific designs, the conversations are leveraged totalk about the general shape of the design space. The artisttrains the generator to the point where it will be trusted tofollow that style in the future, when the generator can be runnon-interactively. Instead of an individual artifact, the goal isdefine a space of desirable artifacts from which the generatormay sample.

It might seem most obvious to setup PCGML by using agenerative statistical model. Such models explicitly capturethe desired content distribution: p(C). In the the proposeddiscriminative learning strategy, we intend only to learn whethera candidate design is considered valid or not: p(V |C). Suchmodels can often be fit from much less data than generativemodels, and they do not require the distribution of content C inthe samples to be representative of the target content distribution(easing the fundamental tension of PCGML). Indeed, manynegative examples which demonstrate invalid designs willcontain design details which should have zero likelihood. Weassume the existence of a generator that can draw samplesfrom a prior p(C) filtered by the condition p(V = true|C),effectively applying Bayes rule to draw samples from theconditional distribution: P (C|V = true). WFC is one suchgenerator that can be directly constrained to yield only designsthat would be classified as valid.

This paper illuminates the implicit use of machine learning inWFC, explains how discriminative learning may be integrated,and presents a detailed worked example of the conversationalteaching model. We refer to the primary user as an artistto emphasize the primarily visual interface. It should beunderstood that the process of creating the input image caninvolve both design skills and programming reasoning: the artistis specifying both an aesthetic goal and a complex system ofconstraints to achieve that goal. This is well within the scope

arX

iv:1

809.

0443

2v1

[cs

.LG

] 1

0 Se

p 20

18

Page 2: Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in PCGML while also opening connections to mixed-initiative design tools. Mixed-initiative

of a technical artist’s skill, and computer artists have a longhistory of leveraging complex computer science approaches inpursuit of aesthetic aims.

II. BACKGROUND

In this section, we review WFC as an example-drivengenerator, characterize PCGML work to date as operatingon only positive examples, and review the conversationalinteraction model used in mixed-initiative design tools.

A. WaveFunctionCollapse

WaveFunctionCollapse is a content generation algorithmdevised by independent game developer Maxim Gumin. Incontrast with generators presented in technical games researchvenues, WFC has seen surprisingly quick adoption within thetechnical artist community. Particularly notable is that WFCcan be considered an instance of PCGML, as we illustrate inSec. III.

The recently released Viking invasion game Bad North [4]uses Oscar Stålberg’s WFC implementation for generatingisland maps.2 Caves of Qud [5], a roguelike that is currentlyin Early Access on Steam, uses WFC as one of its mapgeneration techniques. The Caves of Qud developers haveclosely followed the ongoing development of the algorithm,incorporating its recent improvements.3 In particular, Caves ofQud’s implementation of the improved “fast WFC” enables itto use a higher N for its N ×N patterns, which potentiallyallows the developers to express more complex structures.4

WFC is an instance of content generation using constraintsolving techniques [2]. WFC analyzes an input image andexpresses a weighted constraint satisfaction problem based onthese local similarity properties. There are many alternativeconstraint solving systems that can be substituted for Gumin’soriginal observe-and-propagate cycle, including our own declar-ative implementations with the answer-set solver Clingo [6]and the recent “fast WFC” implementation by Mathieu Fehrand Nathanaël Courant [7].

In this paper, we seek to extend the ideas of WFC whilekeeping them compatible with the existing implementations.One of the more unique aspects of WFC is that it is anexample-based generator that can generalize from a single,small example image. In Sec. V we show that while more thanone example is needed to appropriately sculpt the design space,the additional examples can be even smaller than the originaland can be created in response to generator behavior ratherthan collected in advance.

2As discussed in e.g. @OskSta: “The generation algorithm is a spinoff ofthe Wave Function Collapse algorithm. It’s quite content agnostic. I have abunch of tweets about it if you scroll down my media tweets” https://twitter.com/OskSta/status/931247511053979648

3@unormal: “Got Qud’s new 20x faster WFC implementation down to about50mb in-memory static overhead on init with 0 allocation for all future runs (un-less the size of the gen output gets bigger, but Qud’s doesn’t), so no GC churn.(cc @ExUtumno )” https://twitter.com/unormal/status/984713110257852416

4@unormal, 2:05 AM - 13 Apr 2018: “20x faster also makes higher ordersof N practical, which enables larger scale structures to pop out of wfc.”https://twitter.com/unormal/status/984719207156862976

B. PCGML

Summerville et al. define Procedural Content Generationvia Machine Learning (PCGML) as the “generation of gamecontent by models that have been trained on existing gamecontent [emphasis added]” [1]. In contrast with search-basedand solver-based approaches which presume the user willprovide an evaluation procedure or logical definition ofappropriateness, PCGML uses a more artist-friendly framingthat assumes concrete example artifacts as the primary inputs.PCGML techniques may well apply constructive, search, orsolver-based techniques internally after interpreting trainingexamples.

Machine learning needs training data, and one significantsource of data for PCGML research is the Video Game LevelCorpus (VGLC), which is a public dataset of game levels[8]. The VGLC was assembled to provide corpora for levelgeneration research, similar to the assembled corpora in otherfields such as Natural Language Processing. In contrast withdatasets of game level appearance such as VGMaps,5 contentin the VGLC is annotated at a level suitable for constructingnew, playable level designs (not just pictures of level designs).

The VGLC provides a valuable set of data sourced fromiconic levels for culturally-impactful games (e.g. Super MarioBros and The Legend of Zelda). It has been used for PCG re-search using autoencoders [9], generative adversarial networks(GANs) [10], long short-term memories (LSTMs) [11], multi-dimensional Markov chains [12, Sec. 3.3.1], and automatedgame design learning [13].

Summerville et al. identify a “recurring problem of smalldatasets” [1]: most data only applies to a single game, andeven with the efforts of the VGLC the amount of data availableis small, particularly when compared to the more wildlysuccessful machine learning projects. This is compoundedby our desire to produce useful content for novel games (forwhich no pre-existing data is available). Hence the fundamentaltension in PCGML: asking an artist (or a team of artists) toproduce quality training data at machine-learning scale couldbe much less efficient than just having the artists make therequired content themselves.

Compounding this problem, a study by Snodgrass et al.[14] showed that the expressive volume of current PCGMLsystems did not expand much as the amount of training dataincreased. This suggests that the generative learning approachtaken by these systems may not ever provide the required levelof artist control. While this situation might be relieved by usinghigher-capacity models, the problem of the effort to producethe training data remains.

PCGML should be compared with other forms of example-based generation. Example-based generation long predatesthe recent deep learning approaches, particularly for texturesynthesis. To take one early example, David Garber’s 1981dissertation proposed a two-dimensional, Markov chain, pixel-by-pixel texture synthesis approach [15]. Separate from Gar-

5http://vgmaps.com/

Page 3: Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in PCGML while also opening connections to mixed-initiative design tools. Mixed-initiative

ber,6 Alexei Efros and Thomas Leung contributed a two-dimensional, Markov-chain inspired synthesis method: as thesynthesized image is placed pixel-by-pixel, the algorithmsamples from similar local windows in the sample image andrandomly chooses one, using the window’s center pixel asthe new value [17]. Although WFC-inventor Gumin exper-imented with continuous color-space techniques descendingfrom these traditions, his use of discrete texture synthesis inWFC is directly inspired by Paul Merrell’s discrete modelsynthesis and Paul Harrison’s declarative texture synthesis [18].Harrison’s declarative texture synthesis exchanges the step-by-step procedure used in earlier texture synthesis methodsfor a declarative texture synthesis approach, patterned afterdeclarative programming languages [19, Chap. 7]. Merrell’sdiscrete 3D geometric model synthesis uses a catalog ofpossible assignments and expresses the synthesis as a constraintsatisfaction problem [20].

Unlike later PCGML work, these example-based generationapproaches only need a small number of examples, often justone. However, each of these approaches use only positiveexamples without any negative examples.

C. Mixed-Initiative Design Tools

Several mixed-initiative design tools have integrated PCGsystems. Their interaction pattern can be generalized as aniterative cycle where the generator produces a design andthe artist responds by making a choice that contradicts thegenerator’s last output. When the details of a design areunderconstrained, most mixed-initiative design tools will allowthe artist to re-sample alternative completions.

Tanagra is a platformer level design tool that uses reactiveplanning and constraint solving to ensure playability whileproviding rapid feedback to facilitate artist iteration [21].Additionally, Tanagra maintains a higher-order understandingof the beats that shape the level’s pacing, allowing the artist todirectly specify the pacing and see the indirect result on theshape of the level being built.

The SketchaWorld modeling tool introduces a declarative“procedural sketching” approach “in order to enable designersof virtual worlds to concentrate on stating what they want tocreate, instead of describing how they should model it” [22].The artist using SketchaWorld focuses on sketching high-levelconstructs with instant feedback about the effect the changeshave on the virtual world being constructed. At the end ofinteraction, just one highly-detailed world results.

Similarly, artists interact with Sentient Sketchbook via mapsketches, with the generator’s results evaluated by metrics suchas playability. As part of the interactive conversation withthe artist, it also presents evolved map suggestions to the user,generated via novelty search [23], [24]. Although novelty searchcould be used to generate variations on the artist’s favoreddesign, it is not assumed that all of these variations would beconsidered safe for use. Tools based on interactive evolutiondo not learn a reusable content validity function nor do they

6Efros and Leung later discovered Garber’s previous work [16].

allow the artist to credit or blame a specific sub-structure ofa content sample as the source of their fitness feedback. InSec. IV, we demonstrate an interactive system that can do bothof these.

As these examples demonstrate, mixed-initiative tools facil-itate an interaction pattern where the artist sees a completedesign produced by a generator and responds by providingfeedback on it, which influences the next system-provideddesign. This two-way conversation enables the artist to makecomplex decisions about the desired outcome without requiringthem to directly master the technical domain knowledge thatdrives the implementation of the generator.

The above examples demonstrate the promising potentialof design tools that embrace this mode of control. However,they tend to focus on using generation to assist in creatingspecific, individual artifacts rather than the PCGML approachof modeling a design space or a statistical distribution overthe space of possible content.

Despite the natural relationship between artist-suppliedcontent and the ability of machine learning techniques to reflecton that content and expand it, PCGML-style generators thatlearn during mixed-initiative interaction have yet to be explored.

III. CHARACTERIZING WFC AS PCGML

Snodgrass describes WaveFunctionCollapse as “an exampleof a machine learning-based PCG approach that does notrequire a deep understanding of how the algorithm functionsin order to be used effectively” [12, Sec. 2.8]. This sectiondescribes what and how WFC learns in a generalized vocabu-lary, which we introduce in this paper, that opens up space forexploring alternative learning strategies.

Gumin’s original WaveFunctionCollapse algorithm proceedsin two phases. In the first, the single input image is analyzedto identify a vocabulary of local tile patterns and the possibleadjacencies between those patterns. In the second phase, theresults of the analysis are used as constraints in a generationprocess identifiable as constraint solving [2]. In this section, weinterpret the pattern analysis phase as an instance of machinelearning. In particular, we show that this phase learns threefunctions. One classifies which pattern is present at eachlocation (the pattern classifier function). One identifies if apattern adjacency pairing is within the artist’s preferred style(the adjacency validity function). The final function (the patternrenderer function) determines how a location in the outputshould be rendered given the the constraint-solving generator’schoice of pattern placement assignments. Fig. 1 illustrates therelationship between these three functions.

A. Pattern Classifier

The original implementation of WFC has two models: aSimpleTileModel that uses explicit artist-specified adjacencyrelationships between tiles (expressed in an XML file) andan OverlappingModel that learns tile adjacencies from asource image. The OverlappingModel starts with a patternidentification step, which analyzes the input image for N ×Npatterns of tiles (where N is configurable, typically 2 or 3)

Page 4: Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in PCGML while also opening connections to mixed-initiative design tools. Mixed-initiative

Appearance[array of tiles]

Appearance[array of tiles]

Pattern Table[pattern IDs

and adjacencies]

adjacency validityare these two patterns

allowed to be near each other?

patternclassifier

transformsa local

neighborhoodof appearances

into patternidentifiers

and adjacencies

patternrenderer

transformsa patternidentifierinto an

appearance

Source Image Output Image

solver

f() g()

Pattern Catalog[patterns]

#9 #3

#8 #5

Fig. 1. Overview of WaveFunctionCollapse. Gumin’s OverlappingModel starts with a source image composed of colored tiles. Rather than operating directlyon the tiles, the pattern classifier transforms each local neighborhood into an identifier in a pattern catalog, where each element is a unique N ×N tileneighborhood. The adjacency validity for each pair of patterns at each offset is also recorded (representing a Boolean function with two pattern identifierarguments). These patterns are used to define the constraints for the generator, which uses the constraint solver to propagate pattern placements. Finally, thepattern renderer is needed to transform the array of identifiers back into an appearance. This can be as simple as using the center tile of the N ×N pattern.The italicized terms above are being introduced by this paper.

which are the basic input for its constraint solving approach.The function that transforms a local patch of tiles into apattern identifier is effectively learned by populating a lookuptable. Even though this is a trivial method for constructing theclassifier function, it is useful to see it as explicitly learningto identify where alternative learning methods might be usedinstead.

While the lookup table approach is effective in WFC, itimplicitly disallows the use of any arrangement of tiles that wasnot seen in the source image (it would have no correspondingpattern identifier onto which it could map). In the generalizedsetting, we can imagine a deep convolutional neural networkbeing used to map as-yet unseen tile configurations into theexisting pattern catalog so long as they were perceptuallysimilar enough.

Pattern classification need not be a strictly local operation. Ifwe wanted to generate dungeon levels for a roguelike game, wemay be particularly interested to note the distinction betweentreasure chests which are easily reachable by the player ornot. A global analysis algorithm might assign different patternidentifiers to identically-appearing regions based on whetherthe region should be considered visible or hidden to the player.In platformer level generation, this could distinguish coins orother rewards placed on the player’s default path (as a guide)or off the path (as an enticement to explore).

In the future, we imagine the contextual information used inthe pattern classifier to come from many different sources. Inthe texture-by-numbers application of image analogies [25], theartist hand-paints an additional input image to guide the inter-pretation of the source image and the generation of the targetimage in another example-driven image generator. In Snodgrass’hierarchical approach to tile-based map generation [26], alower-resolution context map is generated automatically using

clustering of tile patterns.The goal of the pattern classifier is to assign, for every

location in an input image, a pattern identifier number. Althoughthe input to this function has an application specific datatype,we require that the output is always a bounded integer (e.g.from 1 to the number of patterns in the catalog). The job of theclassifier is to collect pertinent details about what is happeningin that location of the image and the surrounding context intoa single value.

B. Pattern Renderer

Working inversely from the pattern classifier, the final stepof WFC is to translate a grid of pattern selections back into agrid of tiles. The original implementation of WFC takes thestraightforward approach of directly reusing the tile that is inthe center of the stored N ×N pattern, using the previouslystored lookup table. Interesting animations showing the progressof generation in WFC result from blending the results of thepattern renderer for all patterns that might yet still be placedat a location. Animations of these visualizations over timeattracted several technical artists (and the present authors) tolearn more about WFC.

Generalizing the role of the pattern classifier, we can imagineother functions which decide how to represent a local patch ofpattern placements. Again, we can imagine the use of a deepconvolutional neural network to map a small grid of patternidentifier integers into an rich display in the output. Althoughthe pattern renderer’s input datatype is fixed, the output canbe whatever artist-visible datatype was used as input to thepattern classifier (whether that be image pixels, game objectidentifiers, or a parameter vector for a downstream contentgenerator).

If additional annotation layers are used in the source images(as in the texture-by-numbers application mentioned above

Page 5: Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in PCGML while also opening connections to mixed-initiative design tools. Mixed-initiative

or the player path data present in some VGLC data), it isreasonable to expect that the output of the generator could alsohave these annotation layers. For platformer level generation,the system could output not only a tile-based map design, butalso a representation of which parts of the map the generatorexpected the player to actually reach.

C. Adjacency Relation

The basic constraints in WFC are expressed as validadjacencies between patterns. This is a generalization of theadjacencies between individual tiles: the pattern classifiercaptures additional adjacency information about the local space,much as an image filter kernel can, and allows the adjacencyrelation function to infer more general relations between tiles.We prefer that some tiles be allowed to be placed next to eachother, such as placing a flower in the middle of a garden. Whileother adjacencies are non-preferred: the flowers should not begrowing out of the middle of a carpeted room.

We can characterize the method used to learn the adjacencylegality as Most General Generalization (MGG), the inverse ofclassic Least General Generalization (LGG) inductive inferencetechnique [27]. Gumin’s implementation simply allows anytile-compatible overlapping patterns to be placed adjacent toone another, even if they were never seen adjacent in thesingle source image. A side effect of this is that any patternadjacencies seen in the source image (which are tile-compatibleby construction) must be considered valid for the generator touse later.

While MGG might appear as simple parsing and tallying,something too simple to be considered as machine learning,it is useful to compare this approach with classic machinelearning techniques like Naive Bayes [28, Chap. 20]. NaiveBayes classifiers are trained with no more sophistication thantallying how often each feature was associated with each class.

The art of constructing the single source image for Gumin’sWFC often involves some careful design to include all of thepatterns that are preferred and none that are non-preferred.By allowing for multiple positive and negative examples andusing a slightly altered learning strategy, we show how thismeticulous work can be replaced with a conversation thatelaborates on past examples.

Gumin’s pattern classifier function implicitly captures therelationships between patterns in the training data. The first andmost absolute distinction is between legal and illegal overlaps:because the patterns in the OverlappingModel need to be ableto be placed on top of each other without contradictions, somepatterns will never be legal neighbors: if one 3× 3 pattern hasa blue center tile, while another 3× 3 pattern has a green righttile, the green-right-tile pattern can never be legally placed tothe left of the blue-center-tile pattern (Fig. 2).

The set of all possible patterns that can be assembled fromeven a small set of tiles can be very large. For example, fourdistinct tile types can be arranged into a grid 49 = 262 144ways. The time and space used by the constraint-basedgeneration approached in WFC scales unfavorably with thenumber of pattern identifiers, so a much smaller number of

↔/Fig. 2. An example of a pattern overlap, with a [0,1] offset. The top pair ofpatterns is a legal overlap, because the intersection of the two patterns matches.The bottom pair of patterns is not a legal overlap, because the blue tile andthe green tile conflict. Only the top adjacency is in the legal adjacency set.

patterns is strongly desirable. At the same time, building alookup table indexed by pairs of all possible patterns is alsousually impractical.

Gumin’s MGG learning strategy is hardly the only optionpossible even using the default pattern classifier. A LGGlearning strategy would say to only allow those adjacenciesexplicitly demonstrated in the source image. However, thishighly-constrained alternative might not allow any new outputto be constructed that was not an exact copy of the source image.Likely, the ideal amount of generalization falls somewherebetween these extremes.

In a discriminative learning setup, we might considerall adjacencies explicitly demonstrated in positive exampleimages to be therefore positive examples for the learnedadjacency relation. Likewise, a negative example image needsto demonstrate at least one adjacency that would be consideredinvalid by the learned relation. We refer to the artist’s intendedset of allowed relations the preferred set. See Fig. 3 foran illustration of the relation between the preferred set, the(overlap-compatible) legal set, and the set of all possible patternpairs to be considered for adjacency.

Gumin’s MGG strategy effectively assumes that the preferredset is equal to the legal set, and therefore operates exclusively onthe legal set. It does not try to generalize from the observed setor to infer non-observed but possibly still preferred adjacencies.Artists can attempt to adjust the legal set to match the preferredset, but this requires a combination of technical reasoning andtrial-and-error iteration. Or, alternately, they could switch tousing the SimpleTiled model for which they directly (andwith considerable tedium) specify the complete set of allowedadjacencies.

This is a limitation of the learning strategy, not the WFCalgorithm itself. The output of the agrees() validity functionin the original code just checks if two patterns can legallyoverlap, but any arbitrary adjacency validity function (whichaccepts two patterns and returns a Boolean) can be substitutedhere. As long as the validity function can be computedover all pairs of patterns, it can act as the whitelist for theconstraint domains, without changing the WFC propagationcode itself. The long-range constraint propagation in WFC is

Page 6: Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in PCGML while also opening connections to mixed-initiative design tools. Mixed-initiative

all

positive

legal

preferred

negative learned positive

learned negative

Conceptual Model of Adjacency Sets

Fig. 3. Adjacency Sets: The all set contains all of the possible pattern-offset-pattern adjacency triples, using the patterns observed in the image. The legalset contains all of the adjacency triples that can overlap without collision.The positive set contains the adjacencies directly observed in the image. Thepreferred set is the adjacencies that the artist wants to include. The negativeset is the adjacencies that the artist explicitly forbids. The goal is for the validset (that the machine learns) to match the preferred set (that the artist wants).The original WFC assumes that the set of preferred patterns is identical to theset of legal patterns, and therefore that valid and legal are the same, whilewe disambiguate them. Note that this is describing the adjacencies betweenobserved patterns (not to be confused with observed adjacencies): the set ofall patterns is a superset of the set of observed patterns.

more important for its high success rate without backtracking,rather than the heuristic used [2].

To prototype a variation of WFC supporting an alternatepattern classifier, pattern renderer, and adjacency validityfunction, we initially developed a surrogate implementation ofWFC in the MiniZinc constraint programming language. Later,we integrated the specific ability to generate with a customizedpattern adjacency whitelist into a direct Python-language cloneof Gumin’s original WFC algorithm.

IV. SETTING UP DISCRIMINATIVE LEARNING

As discussed above, the original approach in WFC is todefine the adjacency validity function with the most permissivepossible way, using the legal adjacency set as the valid set.Among other drawbacks, this requires careful curation of thepatterns so that every adjacency in the legal set is acceptable.While allowing very expressive results from a single sourceimage, there are many preferred sets that are difficult to expressin this manner. However, this is just one of the many possiblestrategies.

An anomaly-detection strategy, such as a one-class SupportVector Machine [29], might allow the set of valid adjacenciesto more closely approximate the ideal preferred set, allowing

the artist to use patterns with a much larger legal adjacencyset.

In this section, we consider the presence of possible negativeexamples. By removing adjacency pairs that the artist explicitlyflagged as undesirable, we can more precisely determine thevalid set.

As depicted in Fig. 3, the sets of adjacencies allow us tomake this distinction clearer: the valid set can be learned fromany of these sets. By default, WFC uses the legal adjacencyset, but the preferred-valid set can be adjusted to include moreor less of the legal set, with corresponding effects for thegeneration.

A. Machine Learning Setup

In addition to the single positive source image used by theoriginal WFC, we introduce the possibility of using more sourceimages. Some of these are additional positive examples: it canbe easier to express new adjacencies while avoiding unwantedones by adding a completely separate positive example. In thepositive examples, every adjacency is considered to be valid,as usual.

In contrast, sometimes it is easier to specify just the negativeadjacencies to be removed from the valid set. In a negativeexample, at least one adjacency in them is considered to beinvalid.

Note that these additional example images do not need tobe equal in size. In fact, a tiny negative example showing justthe undesirable pairing would work well to let an artist carveout one bad location in an otherwise satisfactory design.

Finally, we have the validity function: a function thattakes two patterns (plus how they are related in space, e.g.up/down/left/right) and outputs a Boolean evaluation of whetheror not their adjacency is valid. In the original WFC, this issimply an overlapping test: given this offset, are there anyconflicts in the intersection of the two patterns? However, assuggested above, there are more sophisticated validity functionsthat also produce viable results.

B. Human Artist Setup

In our mixed-initiative training approach, we expect the artistto provide at least one positive example to start the process.The image should demonstrate the local tiles that might beused by the generator, but it does not need to demonstrate allpreferred-valid adjacencies. Note that providing one exampleis the typical workflow for WFC (unmodified, Gumin’s codedoes not accept more than one example). However, insteadof expecting the artist to continue to iterate by changing thisone example, which can quickly grow complex, the artist canisolate each individual contribution.

Initially, we set the whitelist of valid adjacencies to be fullypermissive (MGG), covering the legal adjacencies of the knownpatterns. From this, we generate a small portfolio of outputsto sample the current design space of the generator. Even asingle work sample is often enough to spur the next round ofinteraction.

Page 7: Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in PCGML while also opening connections to mixed-initiative design tools. Mixed-initiative

The artist reviews the portfolio to find problems. They canadd one or more generated outputs to the negative exampleset directly, crop an example to make a more focused negativeexample, or hand-create a clarifying example. If additionalpositive examples are desired to increase variety, those canalso be added (although they may immediately prompt theneed for negative examples to address over-generation).

With the new batch of source images, we retrain each of thelearned functions: the pattern classifier, the adjacency validityfunction, and the pattern renderer. There are several optionsfor possible ML techniques, which we will explore in moredetail below. The update pattern classifier defines the spaceof patterns that might be placed by the generator. The newly-learned validity function defines the updated whitelist usedin the constraints. The updated pattern renderer might evendisplay existing pattern grids in a new way. As before, wesample a portfolio. The artist repeats the process until they aresatisfied with the work samples.

The result is a generative system with a design space thathas been sculpted to the artist’s requirements, all without theartist needing to understand or alter any unfamiliar machinelearning algorithms.

V. WORKED EXAMPLE

In this section we walk through an example run of theconversational interaction an artist has with WFC when usinga discriminative learning setup. The conversation takes placeover several iterations that are visually represented in Fig. 4.All outputs shown were generated by executing a minimallymodified Python clone of Gumin’s C# WFC implementationwith an altered pattern catalog (from the pattern classifier) andadjacency whitelist each time. The resulting pattern grids arerendered with the pattern renderer.

In this running example, we make use of a refinement to theMGG strategy used in Gumin’s implementation. Rather thansimply allowing all patterns which agree on their overlappingtiles, we allow all such patterns except those taken fromnegative examples. Indeed, this is still the most generalgeneralization possible under the extra constraints.

Working through the conversation, our artist begins Iteration1 with the algorithm by supplying a single positive example.Here we use the flowers example taken from Gumin’s publicrepository. MGG learns the legality relations exactly like theoriginal WFC. The generator is then run to produce a worksample. Observing this, the artist decides that the image needsmore colorful flowers.

In Iteration 2, the artist augments the positive image set witha second image, having repainted the flowers to be red. Whilethe original WFC code does not accept more than one input,adding additional patterns only required minimal code changes.In the resulting work sample, now both red and yellow flowersare seen (new patterns were made available to the generator).However, the artist is still interested in seeing more flowervariety.

Rather than creating more examples by copy-pasting theoriginal tiles again, this time the artist creates a number of

smaller samples that focus exactly on what they want to addto the composition. These extra tiny examples might throw offstatistics in a generatively trained model. In the work samplefor Iteration 3, the new flowers are present but a surprising newphenomenon arises. The possibility of floating stems resultsfrom the particulars of WFC’s default pattern classifier andadjacency relation function learning. The artist is not concernedwith these implementation details and wishes simply to fix theproblem with additional examples.

By selecting and cropping a 3× 4 region of the last worksample, the artist creates the focused negative example that theyuse in Iteration 4. MGG, now with the extra constraint fromthe negative example, no longer considers floating stems to bea possibility despite the fact that this pattern can be identifiedin the input by the pattern classifier. The work sample is nowfree from obvious flaws.

Having previously only considered very small trainingexamples, the artist notes a particular feature of the largergenerated output. The ground appears uninterestingly flat. InIteration 5, the artist provides a small positive example ofsloped hills, hoping the generator will invent a rolling landscape.However, the work sample for this iteration suggests that thegenerator has not picked up the generality of the idea from thesingle tiny example—it only knows how to build continuousramps without any flowers.

In iteration 6, a few more positive examples show that stemscan be placed on hills and that the bumps of hills can beisolated (not always part of larger ramps). However, in rarecircumstances the generator will now place stems underground.This would not have been spotted without examining manypossible outputs, highlighting the importance of a tool thatallows the artist to give feedback on more than one exampleoutput.

Finally, in Iteration 7, the artist is able to get the look theypreferred. Adding negative examples to take care of the edgecases is easy and can be done without adjusting the earliersource images. Testing shows that the generator is reliablyproducing usable images. Because of the iterations, the artistnow has enough trust in the generator to allow it to performfuture generation tasks without supervision. The learned patternclassifier function, pattern renderer function, and adjacencylegality function compactly summarize the learning from theinteraction with the artist.

In this worked example, every new training example addedbeyond the first is a direct response to something observed in(or observed to be missing from) concrete images produced bythe previous generator. Many demonstrate patterns that are notwhat the generator should produce in the future, even if it isnot realized that this was the case earlier. Instead of iteratingto produce a carefully curated set of 100% valid examples, wemake progress by adding focused clarifications.

VI. CONCLUSION

The fundamental tension in PCGML is that the effort tocraft enough training data for effective machine learning mightundermine the motivation to use PCGML in the first place.

Page 8: Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in PCGML while also opening connections to mixed-initiative design tools. Mixed-initiative

Original, single example image:only generates yellow flowersArtist wants more variety, adds red flowers

No longer flat, but flowers aren’t growing on top of the hillsArtist replaces hill image with more targeted hill images.

A rare side-effect causes under-ground stems to growArtist adds negative examples, forbidding those adjacencies

Flowers now grow on top of gentle rolling hills.Artist trusts generator, will now allow it to act autonomously

Now it generates both red and yellow flowersArtist wants more variety, adds more flowers, but only blossomsStems aren’t anchored, flowers are floating in the skyArtist adds a negative example to forbid floating stems

Many varieties of flowers bloom.

Artist thinks it looks too flat, adds hills.

Positive One Output SampleNegative

1

32

4

67

5

Fig. 4. A worked example of the mixed-initiative conversational teaching model process. The artist observes the results of each step (top black text) andmakes a change for the next step (lower blue text). Each step adds either positive or negative examples. The source images for each output can be seen in thecolumns on the left, with a representative output image to the right.

This makes many machine learning approaches impractical:even when the design goal is flexibility (rather than nominallyinfinite content) the sheer amount of training data required canbe daunting.

However, existing approaches to single-example PCG suchas WaveFunctionCollapse suggests that small training datagenerators are possible. When we combine them with adiscriminative learning strategy, we can leverage the usefulnessof focused negative examples. As our worked example of theconversational teaching model shows, an artist can intuitivelymake targeted changes without being overly concerned aboutmaintaining a representative distribution or disturbing earlier,carefully planned patterns just to fix a rare edge case.

Combining PCGML with mixed-initiative design assistancetools can enable artists to sculpt a generator’s design space.Rather than building just one high-quality artifact, the artistcan train a generator in iterative step to the point where theytrust it for unsupervised generation.

VII. ACKNOWLEDGEMENTS

The authors wish to thank Adam Summerville for theextensive feedback which greatly improved this research.

REFERENCES

[1] A. Summerville, S. Snodgrass, M. Guzdial, C. Holmgård, A. K. Hoover,A. Isaksen, A. Nealen, and J. Togelius, “Procedural content generationvia machine learning (PCGML),” CoRR, vol. abs/1702.00539, 2017.[Online]. Available: http://arxiv.org/abs/1702.00539

[2] I. Karth and A. M. Smith, “WaveFunctionCollapse is constraintsolving in the wild,” in Proceedings of the 12th InternationalConference on the Foundations of Digital Games, ser. FDG ’17. NewYork, NY, USA: ACM, 2017, pp. 68:1–68:10. [Online]. Available:http://doi.acm.org/10.1145/3102071.3110566

[3] A. Liapis, G. Smith, and N. Shaker, “Mixed-initiative content creation,” inProcedural Content Generation in Games: A Textbook and an Overviewof Current Research, N. Shaker, J. Togelius, and M. J. Nelson, Eds.Springer, 2016, pp. 195–216.

[4] O. Stålberg, R. Meredith, and M. Kvale, “Bad North,” Plausible Concept.[5] J. Grinblat and C. B. Bucklew, “Caves of Qud,” 2010.[6] M. Gebser, R. Kaminski, B. Kaufmann, and T. Schaub, “Clingo = ASP

+ control: Preliminary report,” CoRR, vol. abs/1405.3694, 2014.[7] M. Fehr and N. Courant, “fast-wfc,” GitHub repository, 2018.

Page 9: Addressing the Fundamental Tension of PCGML with ...begins to address the fundamental tension in PCGML while also opening connections to mixed-initiative design tools. Mixed-initiative

[8] A. J. Summerville, S. Snodgrass, M. Mateas, and S. O. Villar, “TheVGLC: the video game level corpus,” CoRR, vol. abs/1606.07487, 2016.[Online]. Available: http://arxiv.org/abs/1606.07487

[9] R. Jain, A. Isaksen, C. Holmgård, and J. Togelius, “Autoencoders forlevel generation, repair, and recognition,” in Proceedings of the ICCCWorkshop on Computational Creativity and Games, 2016.

[10] V. Volz, J. Schrum, J. Liu, S. M. Lucas, A. Smith, and S. Risi, “EvolvingMario levels in the latent space of a deep convolutional generativeadversarial network,” arXiv preprint arXiv:1805.00728, 2018.

[11] A. Summerville and M. Mateas, “Super Mario as a string: Platformerlevel generation via LSTMs,” CoRR, vol. abs/1603.00930, 2016. [Online].Available: http://arxiv.org/abs/1603.00930

[12] S. Snodgrass, “Markov models for procedural content generation,” Ph.D.dissertation, Drexel University, 2018.

[13] J. Osborn, A. Summerville, and M. Mateas, “Automatic mapping ofNES games with Mappy,” in Proceedings of the 12th InternationalConference on the Foundations of Digital Games, ser. FDG ’17. NewYork, NY, USA: ACM, 2017, pp. 78:1–78:9. [Online]. Available:http://doi.acm.org/10.1145/3102071.3110576

[14] S. Snodgrass, A. Summerville, and S. Ontañón, “Studying the effects oftraining data on machine learning-based procedural content generation.” inProceedings of the Thirteenth AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment (AIIDE-17), 2017, pp. 122–128.

[15] D. D. Garber, “Computational models for texture analysis and texturesynthesis,” Ph.D. dissertation, University of Southern California, LosAngeles, CA, USA, 1981, aAI0551115.

[16] A. A. Efros and W. T. Freeman, “Image quilting for texture synthesisand transfer,” in Proceedings of the 28th Annual Conference onComputer Graphics and Interactive Techniques, ser. SIGGRAPH ’01.New York, NY, USA: ACM, 2001, pp. 341–346. [Online]. Available:http://doi.acm.org/10.1145/383259.383296

[17] A. A. Efros and T. K. Leung, “Texture synthesis by non-parametricsampling,” in Computer Vision, 1999. The Proceedings of the SeventhIEEE International Conference on, vol. 2, IEEE. IEEE ComputerSociety, 1999, 1999, pp. 1033–1038.

[18] M. Gumin. (2017, May) WaveFunctionCollapse Readme.md. [Online].Available: https://github.com/mxgmn/WaveFunctionCollapse/blob/master/README.md

[19] P. F. Harrison, Image Texture Tools: Texture Synthesis, Texture Transfer,and Plausible Restoration. Monash University, 2006.

[20] P. Merrell and D. Manocha, “Model synthesis: A general proceduralmodeling algorithm,” IEEE Transactions on Visualization and ComputerGraphics, vol. 17, no. 6, pp. 715–728, June 2011.

[21] G. Smith, J. Whitehead, and M. Mateas, “Tanagra: Reactive planning andconstraint solving for mixed-initiative level design,” IEEE Transactions onComputational Intelligence and AI in Games, vol. 3, no. 3, pp. 201–215,2011.

[22] R. M. Smelik, T. Tutenel, K. J. De Kraker, and R. Bidarra, “Semantic3d media and content: A declarative approach to procedural modelingof virtual worlds,” Comput. Graph., vol. 35, no. 2, pp. 352–363, Apr.2011. [Online]. Available: http://dx.doi.org/10.1016/j.cag.2010.11.011

[23] A. Liapis, G. N. Yannakakis, and J. Togelius, “Designer modelingfor sentient sketchbook,” in 2014 IEEE Conference on ComputationalIntelligence and Games, Aug 2014, pp. 1–8.

[24] ——, “Sentient sketchbook: Computer-aided game level authoring.” inFDG, 2013, pp. 213–220.

[25] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin,“Image analogies,” in Proceedings of the 28th annual conference onComputer graphics and interactive techniques. ACM, 2001, pp. 327–340.

[26] S. Snodgrass and S. Ontanon, “A hierarchical MDMC approach to 2Dvideo game map generation,” in Eleventh Artificial Intelligence andInteractive Digital Entertainment Conference, 2015.

[27] G. D. Plotkin, “A further note on inductive generalization,” Machineintelligence, vol. 6, no. 101-124, 1971.

[28] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach,3rd ed. Pearson Education, 2016.

[29] K. R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, “Anintroduction to kernel-based learning algorithms,” IEEE Transactions onNeural Networks, vol. 12, no. 2, pp. 181–201, Mar 2001.