A Recommender System for Automation Rules in the Internet ... · facing companies tackling this...

A Recommender System for Automation Rules in the Internet of

Things

Diogo [email protected]

Instituto Superior Tecnico, Lisboa, Portugal

May 2017

Abstract

More and more devices featuring internet connectivity are being created every day. The value ofsuch devices is in their ability to interact with one another, as part of automation workflows created byusers to improve their home experience. Various platforms offer systems that allow for the creation ofautomation rules between different connected devices, but formulating and defining such automationscan be quite complex. This work explores the automated, personalized recommendation of automationrules for users of connected devices. The application of recommender systems’ techniques to thisdomain is novel, so that exploring the applicability of the recommendation approaches discussed inthe literature is one of the main contributions of this work. The hypothesis that certain groups ofautomations provide synergies was explored, and a system that exploits this by providing recommen-dations based on learning association rules was developed. Various strategies to make association rulemining feasible on an automation dataset were studied, such as generalizing automation rules intorecommendable items, and applying a similarity operator to all items, making automations identifiableacross users. The developed system showed very promising results, under an evaluation methodologyconsisting of gathering precision, recall and coverage metrics, as well as when comparing it againsta more naive recommender developed as a baseline. Finally, it was discussed how the techniquesdeveloped in this work are generalizable to domains outside of automation rule territory.Keywords: Recommender Systems, Automation Rules, Internet of Things, Personalization, Associa-tion Rules.

IntroductionThe Internet of Things (IoT) is the network of de-vices featuring sensors and/or actuators as well asinternet connectivity, allowing them to both inter-act with the environment and exchange data overthe internet. Some estimations predict that thisnetwork will grow to encompass 50 billion connectedobjects by 2020 [11].

Because IoT devices are remotely addressable us-ing the existing internet infrastructure, manufac-turers often provide both software and ApplicationProgramming Interfaces (APIs) to allow consumersand developers alike to interact with their own“things”. This, however, has caused some amountof fragmentation in the interoperability that is pos-sible between devices, which is a critical concern forthe IoT sector [5]. For this reason, some companiesare integrating many heterogeneous objects into oneunified protocol, typically using the manufacturers’API, in order to abstract away each device specifi-cation and allow users to take advantage of syner-gies between their objects. Examples of consumer-facing companies tackling this problem are IFTTT

and Muzzley, which takes part in the present workby providing the data and infrastructure used.

The aforementioned companies provide softwarethat allows users to create automation rules be-tween connected devices (and other web services),which typically consist of programs of the form if-then, or slightly more complex rules. To illustrate,a user can create an automation rule of the form“When I turn on the TV, change the brightness ofmy light bulb to 40%”, or “When my smoke de-tector fires, turn on all of my light bulbs”. Whilethese rules are usually defined explicitly by the IoTuser, this work explores their automatic recommen-dation, in a personalized manner.

The software tools and techniques used to pro-vide suggestions of items that might be of interest tousers belong to the topic of Recommender Systems(RSs). RSs emerged as an independent researchtopic in the mid-90’s [17] and have enjoyed an in-crease in popularity in recent years [14], likely dueto the observed growth in every aspect of the onlineworld, which has made the problem of informationdiscoverability and selection increasingly relevant.

1

RSs have been used successfully in a wide vari-ety of domains, the most prominent and commonlycited being the Netflix movie recommender [7], theAmazon RS for generic products [16], the musicrecommenders built by companies like Spotify [22],among many others, like recommenders for restau-rants, news or people (for instance, in online dat-ing). The main reasons as to why service providersimplement RSs are to increase sales and/or increaseuser satisfaction and fidelity, by helping them copewith information overload [17].

This work focuses on answering the question ofhow to effectively recommend automation rules inthe internet of things. In this context, a recommen-dation consists of a suggestion of a program wherethe user is faced with a binary choice: to acceptthe recommendation, further programming his con-nected devices, or to decline it. An example of asuggestion is: Do you want your lights to turn offautomatically when you leave home?.

The main goals of this work are to discover if andwhich state-of-the-art recommender system’s tech-niques can successfully be applied to the suggestionof programs in the IoT domain, to understand howthese programs can be abstracted into recommend-able items, and to implement and evaluate a rec-ommender engine that performs useful automationrule recommendations.

Because the space of possible rules between con-nected “things” is quite large, requiring effort onthe part of the user to select and construct ap-propriate automations, it is likely that this domainwill benefit from the use of automated suggestions.Specifically, this work aims to contribute to an un-derstanding of:

• The characteristics of IoT rules in the contextof their recommendation, i.e., when viewed asitems in a RS.

• Which data can be successfully exploited toproduce useful recommendations.

• Which RS approaches and algorithms are suit-able or can be adapted to the specificities ofthe IoT automation domain.

More generally, given the growing importance ofthe research and practical applications of recom-mender systems, as well as the growing rate of adop-tion of IoT devices by end-users, this thesis’s topicstands to be of increasing importance for both re-searchers and companies in this space.

Related WorkRSs exist with the goal of generating useful per-sonalized recommendations of items to users, whereitem is the generic term used to denote what thesystem recommends. To produce recommendations

to a user, typically called the active user [20], rec-ommenders use information about the users andtheir preferences, the items, domain knowledge, orcombinations of these. The output is usually ei-ther a list of the top recommendations for the ac-tive user, or a probability representing a predictionfor how much the user will like a given item.

The three most popular approaches to buildingRSs are Collaborative Filtering (CF), where theuser is recommended items that people with sim-ilar tastes liked in the past, Content-based Filter-ing (CB), where the user is recommended itemssimilar to the ones that he himself liked in thepast, and Knowledge-based (KB) systems, thatuse domain knowledge about which item featuresmeet the users’ needs [14]. However, different tax-onomies have been proposed. Some authors like [1]or [4] classify RSs into CF, CB and Hybrid sys-tems, which combine methods of the previous two.This classification scheme sees knowledge-based ap-proaches as techniques to augment Hybrid systems.Burke [9] proposes yet another taxonomy, where be-sides the CF, CB and KB approaches, he also clas-sifies RSs into Utility-based and Demographic. Be-cause Utility-based systems use a utility functionover the space of items that describes the activeuser’s preferences, they can be seen as a variant ofKB systems, since the utility function is a specificencoding of domain knowledge. In a similar man-ner, Demographic recommendations can be seen asa specific CF approach that exploits additional de-mographic user information to determine similarusers [14]. The next section provides an overviewof CF recommendations approaches.

Collaborative Recommendation

To date, collaborative filtering is the most popularand successful approach for providing recommenda-tions to users [20, 23]. The main assumption behindit is that if two users had similar tastes in the past— for instance, they listened to the same songs, orbought the same albums — they are likely to keepexhibiting similar behaviour in the future. Thus, ifhistorically the opinions of two particular users Aand B overlap substantially, and user A expresses anew positive opinion on an item, it makes sense tosuggest that item to user B.

A user opinion on an item is typically denoted byrating. Ratings can be gathered explicitly, by ask-ing the user directly, or measured implicitly by, forexample, analyzing which items the user has boughtor which product pages he has visited.

CF algorithms manage a list of users U ={u1, u2, . . . , un} and a list of items I ={i1, i2, . . . , im}, both finite. Associated to each userthere is also a list of the items that he expressedhis opinion on. Pure CF approaches use only this

2

information to produce recommendations, usuallyin the form of a matrix U × I containing the user’sratings. Thus, these systems do not need to explic-itly model each item, allowing them to recommendarbitrarily complex products.

The next sections discuss the different typesof CF algorithms and the related research. Theneighborhood-based approaches are also calledmemory-based [8] algorithms, in opposition to themodel-based strategies described.

User-based Nearest Neighbor

User-based nearest neighbor methods are one of theearliest approaches to collaborative recommenda-tions, but still enjoy wide popularity due to its sim-plicity, efficiency and accuracy [10]. These methodsproduce recommendations directly from the user-item ratings matrix stored in the system, predictingthe opinion of the active user on an item by usingthe ratings of users with similar rating patterns,called neighbors.

Finding the nearest neighbors of the active useris done by applying a similarity measure to his rat-ings and those of every other user. The most pop-ular are based on the Pearson’s correlation coeffi-cient [21, 15] and the cosine similarity [6], althoughmany different measures of similarity have been ap-plied to RSs and more are emerging, such as fuzzydistance [3].

Pearson correlation measures — in the contextof user similarity in RSs — the extent to whichtwo vectors of ratings are linearly correlated [2, 23].The similarity between users a and b is defined inequation 1, where I is the set of items that bothusers have rated, ra,i is the rating given by user a toitem i and ra is the average of the ratings providedby user a.

sim(a, b) =

∑i∈I(ra,i − ra)(rb,i − rb)√∑

i∈I (ra,i − ra)2√∑

i∈I (rb,i − rb)2.

(1)sim(a, b) takes values between −1 and 1, which

correspond to a strong negative and positive corre-lation, respectively. One reason for the popularityof this method within user-based CF is that thecalculation factors out the averages of the user’sratings [14]. This means that if, in general, usera scores items much higher than user b, they maystill have a strong similarity value as long as theirratings are linearly correlated.

While the literature and the successful practicalexamples of user-based collaborative filtering meth-ods provide evidence for the value of this approach,its challenges and shortcomings are also well re-searched:

Sparsity In many applications, recommender sys-tems are used to recommend very large itemsets, whereas each user has provided feedbackin a very small subset of the items. This resultsin a very sparse User × Item matrix, which cancause systems based on the nearest-neighborstrategy to not be able to find good recom-mendations for certain users.

Cold start A specific challenge related to thesparsity problem is that some systems cannotproduce recommendations while there is no in-formation on a new user or new item. In theliterature, the cold start problem is sometimesreferred to as new user and new item prob-lems [1]. Like with the sparsity issue, this isoften mitigated by using additional informa-tion beyond the user’s ratings.

Scalability The computational complexity ofnearest-neighbor algorithms grows with thenumber of users as well as the number of items,which becomes a problem in systems handlingmillions of users and items.

Item-based Nearest Neighbor

The main idea behind item-based CF is to calcu-late recommendations using the similarity betweenitems instead of users. Intuitively, the assumptionis that if item i received similar ratings to items thatuser a previously liked (given by the same group ofusers), it makes sense to recommend item i to usera.

Item-based CF approaches are popular in do-mains with many users and items that need tocompute predictions in real time, such as large e-commerce websites [14]. In [16], the authors dis-cuss how this approach was used at Amazon.com.The reason for the popularity of this approach inlarge scale domains is that while the algorithmiccomplexity is the same as in user-based strategies,these methods allow for more offline precomputa-tion, specifically of the similarity between items.While, in principle, similarities between users inuser-based CF could also be precomputed, a 2001paper [20] discusses why that does not work as wellas with item-based CF: The user similarity shows alot of variability when new ratings are provided (dueto the typically low number of overlapping ratingsfor any two users), while the item similarities aremuch more stable, so that offline precomputationdoesn’t degrade the quality of recommendations asmuch.

Model-based approaches

Model-based algorithms for collaborative filteringare those that preprocess the ratings matrix of-

3

fline, in opposition to those that process the ratingsdatabase in real time, called memory-based [14].In their pure form, the user-based and item-basedstrategies are memory-based. However, some meth-ods that rely on offline preprocessing were alreadymentioned, such as clustering users before calcu-lating their similarities, or preprocessing the itemsimilarities entirely, which are actually model-basedapproaches.

Model-based CF strategies try to address thesparsity and scalability challenges of the neighbor-hood approaches, at the cost of increased complex-ity and sometimes expensive model building [23].One such strategy is Association Rule Mining,widely used in contexts such as market basket anal-ysis.

The goal of this approach in the context of col-laborative filtering is to detect rules such as “Ifusers like item1, they also like item2 85% of thetime”. [18] describes an architecture to discover as-sociation rules offline and use them to efficientlycompute recommendations at run time.

Other examples of the many different model-based methods that have been proposed in the lit-erature are dimensionality reduction, which decom-pose the user-item rating matrix (or the similaritymatrix) into a reduced latent space, that capturesonly the most important features [19, 6], graph-based methods, which represent user-ratings as agraph that allows for the propagation of informa-tion, or clustering, used to improve run time per-formance.

Evaluating Recommender SystemsAssessing the performance of RSs is a crucial chal-lenge in the field, since many different techniquesclaim to improve recommendation accuracy overthe others. Furthermore, accuracy alone has beenshown to be insufficient for the usefulness of a rec-ommender system in many domains. A discussionof relevant metrics is present in [13], where the au-thors mention the degree to which the recommen-dations cover all items, the degree to which recom-mendations are not obvious and the ability of thesystem to explain the recommendations.

A popular metric to evaluate accuracy is themean absolute error (MAE), which computes theaverage deviation between rating predictions pu,iand actual known rating values ru,i for all users uand items i in the test set T [14]:

MAE =

√∑(u,i)∈T |pu,i − ru,i|

|T |. (2)

Ratings ru,i are usually known due to offline userexperiments. Also commonly mentioned metricsare user coverage and item coverage, which simplymeasure the percentage of users or items for whom

Figure 1: Creating an automation rule on the Muz-zley application. This rule will turn on a light bulbevery weekday at 08:30.

recommendations can be generated. These can beuseful to learn about how the system behaves re-garding the new user and new item problems.

Automation RulesRule Specification

In this section some formalisms about automationrules at Muzzley are described, as these conceptsare referenced throughout this work. A rule is com-posed of one trigger (the “if” part), one or more ac-tions (the “do” part), and zero or more state checks(the “but only if” part). The trigger describes thelogic for when the actions should execute, the ac-tions describe the changes that are applied to theworld when a rule executes, and the states describewhat else must be true in the world when the trig-ger fires, in order for the actions to execute. Figure1 shows an example of an automation rule in theMuzzley mobile application.

Each of the mentioned rule components — trig-ger, action and state — has the following con-

4

stituents:

Device The device that is being referred to. Thisis usually an identifier of a physical IoT device,but may also be time, location, or other con-cepts that are considered “virtual devices” forthe purposes of automation rules.

Property Each device has one or more properties,which represent the different measures that aresupported. For example, a thermostat mighthave the properties temperature, status andbattery-health. Each property has a value ateach point in time, conforming to a well definedschema: status may be a boolean value, rep-resenting whether the thermostat is on or off,while battery-health may be a float between 0and 1. Some properties, such as battery-health,are not actionable, meaning that they cannotbe used in the actions of an automation rule.

Value Each component of the rule must also havea value, conforming to the property schema.In a trigger or state check, this value is usedto test if the rule should be executed. In anaction, it is the new value for the property ofthe device.

Operator The operator describes the logic usedfor this component of the automation rule.Without the operator, a trigger with a specificthermostat as the device, “temperature” as theproperty, and a value of 25, still would not de-scribe whether the automation rule should ex-ecute if the temperature rises above 25, dropsbelow 25, or becomes equal to 25. A set of sup-ported, well defined operators exists at Muzz-ley, some examples being greater-than-or-equal,less-than-or-equal or equals.

Dataset

This work makes heavy use of the Muzzley au-tomation rules dataset, which is summarized in thepresent section. The dataset evolves as users cre-ate and delete automation rules on the Muzzley ap-plication, but to facilitate the evaluation of differ-ent models and parameters on the same underlyingdata, this work analyzes a snapshot of this dataset,frozen in time. The snapshot used throughout thiswork is composed of 2545 automation rules, whichwere created by 1322 distinct users. Most usershave few rules — in fact, slightly more than halfof the users considered have only one automationrule. However, while most users tend to create sim-pler rules, these numbers are not too extreme: 506(20%) automation rules contain state components,and 1099 rules (43%) contain only one action, mean-ing that while the most common number of actions

is 1, more than half the rules have more than 1 ac-tion.

Of the 2545 rules, only 2448 are actually used tolearn recommendations, as 97 rules are discardeddue to being “corrupted”. This is related to usershaving removed from their accounts devices thatwere used in the rules.

DomainIt is worth laying out some characteristics of theautomation rule domain and requirements for thiswork within Muzzley, before discussing the solutionin the next section.

Arbitrary items The recommended item in ourdomain — the automation rule — is an objectthat can be composed arbitrarily by the user.Due to all the possible combinations of opera-tors, devices and properties, and the fact thatthere is no limit to the number of actions thatmight exist in a rule, the items in our RS areactually infinite.

The identity problem The identity of an au-tomation rule is not as well defined as that ofmost items discussed in the previous section.For example, it is possible for one user to definea rule such as “when I arrive within 100m of myhome, open the garage door”, and for anotherto declare “when I arrive within 101m of myhome, open the garage door”. While these areobjectively different rules, if they are accountedas such, the item space becomes extremelylarge, and the data prohibitively sparse. Iden-tifying them as the same item will allow us toexploit the information that two users createdrules to the same effect. For this reason, it isnecessary to define a higher level abstractionof what constitutes a recommendable rule, ide-ally matching as much as possible the humansemantics.

Rating-based approaches A user may create anautomation rule and later delete it because itstopped being useful, but this does not saymuch about how good he believes the rule tobe. Because the use case for a user to keep anactive automation is its utility, we argue thatrating-based RSs are not the most suitable forour domain, since how much a user “likes” anautomation rule is not as natural a question as,for example, how much a user “likes” a movie.

Recommendation frequency In opposition tosome domains where recommendations mustalways be available, or generated in real-time,Muzzley automation suggestions can be shownto a user only when the system is confident inthe prediction.

5

Recommendation typology The different typesof RSs discussed differ not only with regards tothe input data, implementation and computa-tional properties, but also in the types of rec-ommendations that are generated. For exam-ple, pure CF provides a given recommendationbecause “users that rated items similarly to theactive user also like the recommended item”,while CB provides it because “the active userliked items similar to the recommended item”.Association rule mining approaches have a no-tion of items that work well together now, pro-viding recommendations of the sort “this itemis often bought together with the items youare buying”. Used in the automation rule do-main, such approach allows the recommenda-tion of rules with the explanation “this ruleis often used together with the ones you cur-rently have”. Since automation rules affect thestate of devices in the real world, it is desir-able to perform recommendations of automa-tions that may have synergies with the otherrules that the active user already has. Thus,this work explores the application of associa-tion rule mining techniques to the recommen-dation of automation rules, and the developedRS is described in detail in the following sec-tion.

Solution

In light of the characteristics and trade-offs betweenthe different RSs and the discussion about the au-tomation rules domain in the previous section, thedeveloped system exploits association rules learnedfrom the automation dataset to provide recommen-dations. In such a RS, suggestions are generated onthe basis of groups of rules “working well” together,which is a desirable characteristic of this approach.

The solution explored in this thesis is to gener-alize each rule into a recommendable item, whichwe call templates. A similarity function betweentemplates is developed and used to decide whetherdifferent automation rules can be considered thesame for the purposes of mining association rules.The learned association rules are then used to rec-ommend automation rule templates, along with aconviction value. These strategies are discussed inmore detail in the following sections.

Rules Generalization

While it is desirable to produce recommendationsthat users can accept with the least amount of workpossible, it is useful that some customization is pos-sible when the user is accepting a suggestion. Asan example, the RS might ask “Do you want yourlights to be turned off automatically when you leavehome?”. For this recommendation to become aproper automation rule, it requires not only that

the user accepts it, but also that he configures twothings:

• The geographical coordinates of his home.

• Which of his light bulbs are to be used withthis automation rule.

Because the user configures parts of the automa-tion rule when accepting it, the recommendationis actually of an automation “template”, which isinstantiated by the user, giving rise to an automa-tion rule. This work explores an approach in whichthe rule instances in the dataset are first general-ized into these templates, where the model is thenlearned, so that the recommendations produced arealso of templates. Thus, the rule generalization stepdescribed in this section strongly influences the typeof output of the system.

In the example above, the system is recommend-ing a template in which the action is to turn off“lights”. This is possible because the main step ofgeneralizing a rule instance into a template is to re-place the specific devices (identifiers for real worldIoT devices) by the class of the devices, which arestrings such as “lightbulb”, or “camera”. All spe-cific IoT devices integrated in the Muzzley ecosys-tem already had classes, prior to this work.

The above example also shows that the value ofthe trigger — the geographical coordinates — isleft for the user to fill, yet the value of the action —turning the devices “off” instead of “on” — is al-ready instantiated, as opposed to asking the user tochoose the status for the light bulbs. To accomplishsuch flexibility in the developed RS, the generaliza-tion function uses information from a configurationfile that details whether the “value” part of eachrule component should be discarded, based on theoperator and property.

SimilarityTo arrive at a dataset that is suitable for the min-ing of association rules, the system must be able toidentify which templates from different users shouldbe considered equal. Strictly comparing the JSONobjects that describe the templates would be a pos-sible solution, but very similar rules that differ onlyslightly — for instance, two rules with the sametriggers and states as well as many equal actionsexcept for one — would be seen as different. Thus,a more robust equality operator was developed be-tween templates, which returns a similarity scorebetween 0 and 1 (0 being strictly different and 1being strictly equal). Because it still is necessary todecide whether to consider two templates the samefor the association rule mining purposes, a hyperpa-rameter called similarity-threshold was introduced.

The developed equality operator takes two au-tomation templates and compares the triggers com-

6

ponents, obtaining a triggers similarity score. Theactions and states similarity scores are obtained inthe same way and the 3 results averaged. Whencomparing two components, the JSON objects arestrictly compared, which works well because therule generalization step already took care of remov-ing device identifiers or values that we would notlike to be meaningful for such comparison.

In order to find the user’s items in the RS, wego through every generalized rule from the userssequentially, applying the equality operator, in thefollowing manner:

1. The first generalized rule/template is given theID “1”.

2. The next template is compared against all thepreviously analyzed templates.

(a) If, given the similarity-threshold param-eter, the template being analyzed is notconsidered equal to any of the existingIDs, attribute it a new ID.

(b) Otherwise, attribute the ID of the high-est similarity score to the template beinganalyzed.

3. Repeat step 2 until no more templates are left.

This strategy allows us to arrive at a dataset thatis suitable to learn association rules, since differ-ent users now have templates with the same IDs,meaning that these users have the same item for thepurposes of our RS. The number of IDs generatedwas 534, which is quite high in comparison with thenumber or automation rules considered (2545), andis still growing healthily as more rules are analyzed.

This sparsity comes from the fact that users docreate many different automation rules, and thegeneralization strategies described map each ruleto a given template in a one to one relationship.But one realization is worth exploring: It is possi-ble to increase the granularity of items by generaliz-ing some rules to multiple, simpler templates, with-out loss of correctness. For example, let’s say user“1000569” has only one automation rule — “When-ever my garage door opens, turn on the TV and setthe thermostat to 23C, but only if time is between18:00 and 20:00 PM.”. Instead of generalizing thisrule to a single template (which might occur rarelyin other users), the system can equivalently considerthat user “1000569” has 2 rules:

1. “Whenever my garage door opens, turn on theTV, but only if time is between 18:00 and 20:00PM.”

2. “Whenever my garage door opens, set the ther-mostat to 23C, but only if time is between 18:00and 20:00 PM.”

Together, these two rules have the same be-haviour than the one actually created by the user.The formalism for accomplishing this division whilemaintaining the rule/s semantics is simply to createas many templates as there are actions in the origi-nal rule, each with one action, keeping the triggersand states equal in all of them. This approach hasthe advantage that the system has a more granulardataset to learn from, so that instead of trying tolearn which items work well together from complexand often rare items, it does so from simpler ones.It is worth noting that the system can still per-form recommendations of complex templates, be-cause recommendations for the same user can bemerged together using the inverse of the strategydescribed above — merging templates that have thesame trigger and states into a single one, uniting allthe actions.

Using the latter approach there are now 4384items in our dataset, instead of 2545, due to thedivision of rules. The number of IDs generated isnot only lower as a proportion of the amount oftemplates analyzed, it is actually lower in absoluteterms — 510 vs 534, which means that the datasetused to mine association rules is less sparse.

Association Rule Mining

From the steps described, the system arrives at adataset that is suitable to learn association rulesfrom. Two important parameters to associationrule mining algorithms are the minimum supportand minimum confidence. If these variables takeon values that are too high, few or no associationrules might be found, as the bar for what consti-tutes a relevant association is set too high. On theother hand, if these values are too low, many weakassociation rules might be found, which in the con-text of RSs may lower the precision of recommen-dations. These parameters end up heavily affect-ing the quality of our system’s recommendations,and are for that reason evaluated empirically in thiswork, where we test metrics such as precision andrecall for various values of minimum support andconfidence.

The association rule mining algorithm used inthis work is the Apriori. A Common Lisp (CL)version of Apriori was implemented in the contextof this thesis and the source code was publishedopenly as a CL library named cl-association-rules[12].

Apriori outputs association rules, as well as thesupport and confidence for each. A sample of 3 re-lated association rules mined from the dataset isshown in Table 1. Support is 0.054517135 for therules shown, which means that the 3 automationsinvolved in each association rule are found togetherin around 5.5% of transactions/users. The confi-

7

dence values represent how often a user has the au-tomations in the body of the rule, given that he hasthe head, e.g., 92% of users that created automation124 also have automations 1 and 2. The followingsection describes how this is used to perform rec-ommendations at Muzzley.

RecommendingAutomation recommendations at Muzzley are post-filtered by an existing contextualizer component,which takes care of dispatching the suggestion tousers at the appropriate times and frequencies. Forthat reason, the developed system must only outputthe recommendation itself and a conviction1 value,used by the contextualizer to decide the prioritiesof inputs from different systems.

Recalling that mining an association rule such as(73) => (43), with support of 5.8% and confidenceof 62%, means that 5.8% of all users have both au-tomations 73 and 43, and 62% of users that haveautomation 73 also have 43, RSs typically operateby recommending the body of the association rule(43) to users that have the head (73) but not thebody. For this reason, each association rule minedcan produce many recommendations across differ-ent users.

In this solution, every mined association rule isused to produce recommendations. Instead of fil-tering association rules that have low support andconfidence values after they are learned, the Aprioriparameters (minimum support and minimum con-fidence) are simply increased so that those associ-ations are never mined at all. This improves thesystem’s performance and simplifies analysis, sincethe Apriori parameters already exist and must bestudied, and no more parameters are added to theoverall model. This, however, makes these param-eters very influential in the quality and quantity ofrecommendations.

In order to compute a conviction value for a rec-ommendation, it is natural to take into accountboth the support and confidence of the associationrule that originated it. Our approach consists onstandardizing the support and confidence values be-fore averging them, by subtracting each value to themean of the sample of association rules discovered,and dividing by the standard deviation, as shownin equation 3.

Zscore =x− µs

(3)

Thus, the conviction value sent to the contextu-alizer alongside recommendations is the average ofthe standardized confidence and support from thecorresponding association rules.

1The term conviction is used in this work instead of con-fidence (the actual name used within Muzzley) so as not tobe confused with the association rules confidence value.

EvaluationMethodologyThe applicability of the different metrics and strate-gies described is highly dependent on the type ofrecommender and the available data. With rating-based CF systems, it is common to compute accu-racy, which is a measure of how well the model’srating predictions match the real ratings. There isno concept of ratings in our RS, but since the goalis to find automations that work well together, itis possible to validate the model using the transac-tions dataset.

One approach to evaluate our model against agiven (unseen) user in the data is to select a randomsubset of her rules, perform recommendations, andcompare against the rest. The more recommenda-tions match with the non-selected rules, the betterthe system is performing. Intuitively, this approachis measuring the degree to which the system wouldrecommend the automation rules that were in factcreated by users, had he only created a subset ofthem.

The overall strategy is to perform 5-fold cross-validation, where users are randomly split into 5groups. One group is the validation set used to testthe model, and the remaining 4 groups are usedto learn the association rules. The cross-validationprocess is then repeated 5 times (the folds), witheach of the 5 subsamples used exactly once as thevalidation data. The results from the folds are thenaveraged to produce a single estimation.

MetricsWhen comparing the system’s recommendations fora user with the ground truth in the transactionsdataset, a few metrics typically used in the field ofInformation Retrieval (IR) were gathered:

Precision Precision is the fraction of recom-mended items that are relevant, i.e., that canbe found in the user’s non-selected automa-tions. A high precision value means that mostrecommendations were found in the validationuser, and were thus “good”. However, a sys-tem that optimizes solely for precision mightend up performing very few recommendations,so as not to degrade this value.

Recall Recall is the fraction of relevant automa-tions that were successfully recommended. Ahigh recall value means that most automationsthat could have been found for a user, wereindeed recommended. While this is desirable,this value can be inflated as a consequenceof the RS performing many recommendations,with no regard for precision.

F1-score Because of the mentioned drawbacks oflooking solely to each of the two previous met-

8

Table 1: Sample of 3 related association rules mined from the dataset.Association Rule Support Confidence(2) => (1 124) 0.054517135 0.28225806(1) => (2 124) 0.054517135 0.443038(124) => (2 1) 0.054517135 0.92105263

rics and the fact that they tend to vary in op-posite directions, the F1-score is often used tocombine precision and recall. It is also referredto simply as F-measure, and it is the harmonicmean of precision and recal.

Coverage Coverage is the fraction of all items thatcan be recommended by the RS that were infact recommended. In this work, the possiblerecommendations are the number of templatesgenerated.

Parameter evaluationFigure 2 shows the results for the 5-fold cross-validation, obtained by averaging each measureacross the 5 folds, when fixing min-confidence at20% and varying the min-support values. Thecharts in the present section display the variableof interest in decreasing order — This is by design,as we are looking at what happens as the number ofrecommendations increase, which occurs when min-support and min-confidence decrease.

These results show that as min-support decreases(making the system learn association rules more lib-erally), metrics generally behave in the expected di-rections:

• Precision decreases, since we are recommend-ing automations with less conviction. Thisshows that recommendations based on associ-ation rules with high support values do havehigher predictive value than with lower supportvalues.

• Recall increases, since more of the validationuser’s automations are recommended by thesystem.

• Coverage increases, showing more distinct au-tomation rules being recommended.

As we move along the horizontal axis in Figure 2,F1-score improves except when min-support dropsbelow 0.5%. Since recall keeps rising, this meansthat we start to lose too much precision with somany recommendations. Coverage strictly grows asmore recommendations are made, of course. At thelowest min-support value studied (0.25%) coveragerises as high as 13.2%, which amounts to a total ofaround 67 different automation rules recommended,given the existence of 510 templates/items.

The min-support chosen for our RS in produc-tion at Muzzley was 0.5%, but this choice is en-tirely dependent on the business logic. For certainrecommendation use cases, for instance when rec-ommendations must be made with a certain (high)frequency, it may be sensible to accept further lossesin precision, in order for the system to obtain highercoverage and/or recall. More generally, we see thatthe absolute values of the precision and recall met-rics seem very promising. For instance, the factthat the model can get precision values above 50%hints that the recommendation approach of miningassociation rules has merit in our domain.

Comparison with baseline strategy

Blindly recommending the most popular items toevery user is the simplest approach to building aRS. While recommendations in such a rudimentarysystem are not personalized, previous work on RSshas shown that this simple strategy performs im-pressively well, sometimes outperforming complexmodels [14].

In order to evaluate the quality of our recommen-dations more decisively, a RS that blindly suggeststhe most popular automations to every user wasdeveloped and the metrics previously analyzed inthis section were determined, which constitutes aninteresting baseline.

Figure 3 shows the results obtained. Predictablythe coverage metric grows exactly linearly as thenumber of recommendations increases, as it is sim-ply the number of distinct recommendations made(either 1, 2, ... 10, in this example) divided by thelittle more than 500 total items that the systemcan recommend. Precision and recall also vary inthe expected manner.

The best version of this system by the F1-Scoreis the top-3 automations recommender which, ofcourse, suggests the 3 most popular templates toevery user. However, these results heavily under-perform the association rules based system. Table2 shows the best results from top-n RS as well asthe results from the model-based RS with the pro-duction parameters.

The model-based recommender outperforms thetop-3 system in every metric. Coverage is muchhigher because the system recommends 32 distinctautomation rules instead of just 3. It can be ar-gued that this is also the reason why recall is bet-ter — The more recommendations that are made,

9

Figure 2: Precision, recall, F1-score and coverage with varying min-support. Min-confidence is fixed at20%.

Table 2: Top-n items recommender vs association rule based recommender.Top-n most popular Precision Recall F1 Score Coverage

Top-3 items recommender 16.81% 20.8% 18.59% 0.58%Model-based recommender

min-support: 0.5%min-confidence: 40%

56.41% 43.12% 48.9% 6.25%

the more likely the system is to “guess” the vali-dation users’ templates. However, while the highernumber of recommendations should make precisionlower, the model-based system has 56.41% in theprecision metric, versus the 16.81% of the top-n RS.This is still more than twice the precision of eventhe top-1 recommender, which has even worse recalland coverage.

These results show that the hypothesis thatgroups of automation rules can often be found to-gether due to synergies between them is likely true,since the approach that exploits this by recommend-ing automations based on mining association rulesfrom existing data showed strong results.

Conclusions and Future WorkSummary

The trend of appliances featuring internet connec-tion is a growing one, as the diversity of IoT de-vices is increasing at a staggering pace. Superfi-cially, such devices might look similar to their oldercounterparts, featuring only a remote control thatcan be a computer or mobile phone. However, the

value in connected “things” seems to reside in theirability to be automated and work together to thebenefit of the user. Many platforms, including Muz-zley, are building systems that allow for the creationof such automations and interactions between IoTdevices. This thesis explored one natural next stepto bring the smart home closer to a reality, which isthe automatic and personalized recommendation ofautomation rules, easing the still cumbersome pro-cess of coming up with, and defining, automations.

After exploring the related work in RSs as thor-oughly as possible, focusing on the advantages anddisadvantages of each type of solution, as well asdefining the domains where the different approachesare suitable, we defined the characteristics of ourdomain, as well as some general requirements forthe system to be developed at Muzzley. This led usto the hypothesis that generating recommendationsby learning association rules from the automationsdataset would be the most suitable approach, giventhe likelihood that certain automation rules workwell together, and such groups can be found in thecurrent Muzzley data.

10

Figure 3: Top-n popular items recommendation results.

We go on to describe our solution to make min-ing association rules feasible in the automationsdataset. This consists of transforming the spe-cific rule instances into more generic items, thatwe named templates and which when recommended,allow the user to input certain values when desir-able. Using a comparison operator between themto arrive at a new dataset where it is possible toidentify the same automation in different users, itbecame viable to learn associations between items,with an implementation of Apriori that we devel-oped and open-sourced. Other techniques used toachieve better results were discussed, such as pre-processing the dataset by simplifying complex rulesin an equivalent manner.

The developed system was then evaluated todemonstrate the hypothesis, showing promising re-sults. A methodology to analyze the system offlinewas devised and metrics such as precision, recalland coverage were gathered with various combina-tions of the model’s hyper-parameters, namely themin-support and min-confidence Apriori input vari-ables. A baseline, non-personalized RS that rec-ommends the most popular automations was alsodeveloped so that we could analyze it in a simi-lar fashion and compare the results. These showedthat our hypothesis that recommending based onthe idea that certain groups of automations makesense together is likely correct, as our recommenderbeat the simpler one by decisive margins in all met-rics studied. Generally, the fact that the systemcould achieve very high precision values even with

more relaxed parameters so that more recommen-dations were produced, seems to us to be a strongindicator that the approach taken has merit for thisdomain.

Further WorkCurrently, a configuration file allows us to definewhich values should be recommended to the user,and which should be left for the user to select whenaccepting an automation. Instead of this being a bi-nary option, it would be interesting for the systemto recognize a third option, where it would com-pute a running mode or average for that value as itfinds similar templates, allowing a recommendationto contain a specific value for continuous properties(e.g. brightness).

Another improvement to tackle the new-userproblem of our system (no recommendations areperformed to users without automation rules) couldbe to use it as part of a hybrid system, where a dif-ferent RS would be used for such users. Even usinga naive system such as the top-n recommender de-veloped as a baseline might be better than providingno recommendations.

References[1] G. Adomavicius and A. Tuzhilin. Towards the

next generation of recommender systems: Asurvey of the state-of-the-art and possible ex-tensions. Knowledge and Data Engineering,IEEE Transactions on, 17(6):734–749, 2005.

[2] H. J. Ahn. A new similarity measure for col-laborative filtering to alleviate the new user

11

cold-starting problem. Information Sciences,178(1):37–51, 2008.

[3] M. Y. H. Al-Shamri and K. K. Bharadwaj.Fuzzy-genetic approach to recommender sys-tems based on a novel hybrid user model.Expert systems with applications, 35(3):1386–1399, 2008.

[4] M. Balabanovic and Y. Shoham. Fab: content-based, collaborative recommendation. Com-munications of the ACM, 40(3):66–72, 1997.

[5] D. Bandyopadhyay and J. Sen. Internet ofthings: Applications and challenges in tech-nology and standardization. Wireless PersonalCommunications, 58(1):49–69, 2011.

[6] R. Bell, Y. Koren, and C. Volinsky. Modelingrelationships at multiple scales to improve ac-curacy of large recommender systems. In Pro-ceedings of the 13th ACM SIGKDD interna-tional conference on Knowledge discovery anddata mining, pages 95–104. ACM, 2007.

[7] J. Bennett and S. Lanning. The netflix prize.In Proceedings of KDD cup and workshop, vol-ume 2007, page 35, 2007.

[8] J. S. Breese, D. Heckerman, and C. Kadie. Em-pirical analysis of predictive algorithms for col-laborative filtering. In Proceedings of the Four-teenth conference on Uncertainty in artificialintelligence, pages 43–52. Morgan KaufmannPublishers Inc., 1998.

[9] R. Burke. Hybrid recommender systems: Sur-vey and experiments. User modeling and user-adapted interaction, 12(4):331–370, 2002.

[10] C. Desrosiers and G. Karypis. A compre-hensive survey of neighborhood-based recom-mendation methods. In Recommender systemshandbook, pages 107–144. Springer, 2011.

[11] D. Evans. The internet of things: How thenext evolution of the internet is changing ev-erything. CISCO white paper, 1:1–11, 2011.

[12] D. Franco. cl-association-rules. https:

//github.com/diogoalexandrefranco/

cl-association-rules, 2017.

[13] J. L. Herlocker, J. A. Konstan, L. G. Terveen,and J. T. Riedl. Evaluating collaborative filter-ing recommender systems. ACM Transactionson Information Systems (TOIS), 22(1):5–53,2004.

[14] D. Jannach, M. Zanker, A. Felfernig, andG. Friedrich. Recommender systems: an in-troduction. Cambridge University Press, 2010.

[15] R. Jin, L. Si, and C. Zhai. Preference-basedgraphic models for collaborative filtering. InProceedings of the Nineteenth conference onUncertainty in Artificial Intelligence, pages329–336. Morgan Kaufmann Publishers Inc.,2002.

[16] G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-to-item collabo-rative filtering. Internet Computing, IEEE,7(1):76–80, 2003.

[17] F. Ricci, L. Rokach, and B. Shapira. In-troduction to recommender systems handbook.Springer, 2011.

[18] B. Sarwar, G. Karypis, J. Konstan, andJ. Riedl. Analysis of recommendation algo-rithms for e-commerce. In Proceedings of the2nd ACM conference on Electronic commerce,pages 158–167. ACM, 2000.

[19] B. Sarwar, G. Karypis, J. Konstan, andJ. Riedl. Application of dimensionality re-duction in recommender system-a case study.Technical report, DTIC Document, 2000.

[20] B. Sarwar, G. Karypis, J. Konstan, andJ. Riedl. Item-based collaborative filtering rec-ommendation algorithms. In Proceedings of the10th international conference on World WideWeb, pages 285–295. ACM, 2001.

[21] U. Shardanand and P. Maes. Social informa-tion filtering: algorithms for automating wordof mouth. In Proceedings of the SIGCHI con-ference on Human factors in computing sys-tems, pages 210–217. ACM Press/Addison-Wesley Publishing Co., 1995.

[22] Y. Song, S. Dixon, and M. Pearce. A survey ofmusic recommendation systems and future per-spectives. In 9th International Symposium onComputer Music Modeling and Retrieval, 2012.

[23] X. Su and T. M. Khoshgoftaar. A survey ofcollaborative filtering techniques. Advances inartificial intelligence, 2009:4, 2009.

12

A Recommender System for Automation Rules in the Internet ... · facing companies tackling this...

Documents

Transcript of A Recommender System for Automation Rules in the Internet ... · facing companies tackling this...