Probabilistic Formal Analysis of App Usage to Inform Redesign · with a case study of AppTracker...

Probabilistic Formal Analysis of App Usage toInform Redesign

Oana Andrei, Muffy Calder, Matthew Chalmers, Alistair Morrison, andMattias Rost

School of Computing Science, University of Glasgow, UK

Abstract. Evaluation and redesign of user-intensive mobile applicationsis challenging because users are often heterogeneous, adopting differentpatterns of activity, at different times. We set out a process of integrat-ing statistical, longitudinal analysis of actual logged behaviours, formal,probabilistic discrete state models of activity patterns, and hypothesesover those models expressed as probabilistic temporal logic properties toinform redesign. We employ formal methods not to the design of the mo-bile application, but to characterise the different probabilistic patterns ofactual use over various time cuts within a population of users. We definethe whole process from identifying questions that give us insight intoapplication usage, to event logging, data abstraction from logs, modelinference, temporal logic property formulation, visualisation of results,and interpretation in the context of redesign. We illustrate the processthrough a real-life case study, which results in a new and principled wayfor selecting content for an extension to the mobile application.

1 Introduction

Evaluation and redesign of deployed user-intensive mobile applications (apps) ischallenging because users are often heterogeneous and adopt different patternsof activity, at different times. Good redesign must support users’ different stylesof use, and should not be based solely on static attributes of users, but onthose styles, which may be dynamic. This raises many questions, including: whatcharacterises the usage of a user, how should we identify the different styles ofuse, how does that characterisation evolve, e.g. over an individual user trace,and/or over days and months, and how do properties of usage inform evaluationand redesign? This paper attempts to answer these questions, setting out anovel process of integrating statistical analysis of logged behaviours, probabilisticformal methods and probabilistic temporal logic with rewards.

Our approach is based on integrating three powerful ingredients: (1) infer-ence of admixture probabilistic Markov models (called activity patterns) fromautomatically logged data on user sessions, (2) characterisation of the activitypatterns by probabilistic temporal logic properties using model checking tech-niques, and (3) longitudinal analysis of usage data drawn from different timecuts (e.g. the first day, first month, second month, etc.). Our contribution isdefining the whole process from identifying questions that give us insight into

2

an app usage, to event and attribute logging, data pre-processing and abstrac-tion from logs, model inference, temporal logic property formulation using theprobabilistic temporal logic PCTL with rewards [1], visualisation of results andinterpretation in the context of redesign. Our work provides new insights intoapp usage and affords new redesign ideas that are solidly grounded in observedactivity patterns. We apply scientific and formal methods in a novel way to theobserved use of artefacts that have been engineered. We illustrate throughoutwith a case study of AppTracker [2], a freely available mobile, personal produc-tivity app that allows users to collect quantitative statistics about the usage ofall apps installed on their iOS devices, i.e., iPhones, iPads, or iPods.

Initially, we instrument the app of interest to log usage behaviours and pro-cess them into sets of user traces expressed in terms of higher level actions. Theseactions are carefully selected, jointly by analysts and developers, to relate to theintended analysis: they determine the scope of properties and the dimensions ofthe state space underlying the model. We segment the sets of traces into differenttime cuts so that we can determine how activity patterns evolve over time.

For each time cut of user traces we infer admixture bigram models of activitypatterns, where activity patterns are discrete-time Markov chains. We use ad-mixture models because we are not classifying users into a single prototypicalbehavioural trait (or usage style), but we have complex behavioural traits whereindividuals move between patterns during an observed user trace. Bigrams, whichprovide the conditional probability of an action given the preceding action, areone of the most successful models for language analysis (i.e. streams of sym-bols) and are good representations for populations of dynamic, heterogeneoususers [3]. We characterise each user trace as an admixture of K activity patternsshared within the population of users. K is an important exploratory tool, andrather than assuming or finding an optimal value for K, we use it to explore thevariety of usage styles that are meaningful to redesign. We typically start withlow K values, but the choice may be dependent on factors intrinsic to the app.For a given K value, the parameters of the inferred model are the probabilities ofa given action (from a preceding action), for each activity pattern, as well as theprobabilities of transitioning between activity patterns. It is important to notethat we are not inferring the underlying system topology, which is determinedby the functionality of the app. We are investigating an artefact that has beenengineered, but there may be differing generating processes of use. We employ astandard local non-linear optimisation algorithm for parameter estimation – theExpectation-Maximisation (EM) algorithm [4]. We use EM, as opposed to sayMCMC, because it is fast and computationally efficient for our kind of data. EMconverges provably to a local optimum of the criterion, in this case the likelihoodfunction, and as such validation is not an issue. To the best of our knowledge,inferring such temporal structures has not been described outside our group.

We then hypothesise temporal probabilistic properties, expressed in PCTLextended with rewards [1], [5] to explore the activity patterns, considering variousadmixture models, values for K, and time cuts. We compare the distribution ofpatterns in the population of users longitudinally and structurally drawing on all

3

the formal analysis to provide new, grounded insights into possible redesign. Inour case study, our analysis mitigates against a simple partitioning of differentversions, specific to each activity, but rather offers a new and principled way ofselecting glanceable information as an extension of the app.

Three AppTracker designers were involved in this paper, guiding the inte-gration of formal analysis with hypotheses about user behaviours. This is oursecond application of formal analysis to models of inferred user behaviour: in [6]we defined activity patterns for an individual user as a user metamodel withrespect to a population of users, and analysed a mobile game app. This workdiffers substantially in that here our goal is redesign in the context of a differentapp, we use the parameter K as an exploratory tool, employ completely differenttemporal properties (e.g. using rewards) and longitudinal analysis, and analysethe whole population of users, comparing distributions of activity patterns acrossthe user population longitudinally for a fixed and different values for K.

2 Technical Background

We assume familiarity with Markov models, PCTL, PRISM probabilistic modelchecking, bigram models and Expectation-Maximisation algorithms.

A discrete-time Markov chain (DTMC) is a tuple D = (S, s, P, l) where:S is a set of states; s ∈ S is the initial state; P : S × S → [0, 1] is the tran-sition probability function (or matrix) such that for all states s ∈ S we have∑

s′∈S P(s, s′) = 1; and l : S → 2A is a labelling function associating to eachstate s in S a set of valid atomic propositions from a set A. A path (or execution)of a DTMC is a non-empty sequence s0s1s2 . . . where si ∈ S and P(si, si+1) > 0for all i ≥ 0. A transition is also called a time-step.

Probabilistic Computation Tree Logic (PCTL) [1] allows expression of aprobability measure of the satisfaction of a temporal property. The syntax is:

State formulae φ ::= true | a | ¬φ | φ ∧ φ | P./ p[ψ]Path formulae ψ ::= Xφ | φU≤n φ | F≤n φ

where a ranges over a set of atomic propositions A, ./∈ {≤, <,≥, >}, p ∈ [0, 1],and n ∈ N ∪ {∞}. State formulae are also called temporal properties. The usualsemantics apply, with U and F standing for until and eventually, respectively.

PRISM [7] computes a satisfaction probability, e.g. P=? [ψ ], allowing alsofor experimentation when the range and step size of the variable(s) are specified.PRISM supports a reward-based extension of PCTL, called rPCTL, that as-signs non-negative real values to states and/or transitions. R{x}=?

[C≤N

]com-

putes the reward named x accumulated along all paths within N time-steps,R{x}=? [ F φ ] computes the reward named x accumulated along all paths untilφ is satisfied. Filtered probabilities check for properties that hold from sets ofstates satisfying given propositions. Here we use state as the filter operator: e.g.,filter(state, φ, condition) where φ is a state formula and condition a Booleanproposition uniquely identifying a state in the DTMC.

4

Inference of admixture bigram models. Given a vocabulary V of size n,a trace over V is a finite non-empty sequence of symbols from V . Let S ={0, 1, . . . , n} be the set of states and consider a bijective mapping V 7→ S. Letx be a data sample of M traces over V , x = {x1, . . . , xM}. Each trace xm canbe represented as a DTMC: the set of states S = {0, 1, . . . , n}, the initial stateis 0, the transition probability matrix is the n× n transition-occurrence matrixsuch that xmij on position (i, j) gives the number of times the pair (xmi, xmj)occurs in the trace xm. Consider K n × n transition matrices denoted Φk overthe states in S, for k = 1, . . . ,K, such that Φkij denotes the probability ofmoving from state i to state j. Also consider a M × K matrix Θ such thatat any point in time Φk is used by the trace xm with probability Θmk. Letλ = {Φk, Θmk | k = 1, . . . ,K; m = 1, . . . ,M} be the parameters of the statisticalmodel. We use the EM algorithm [4] to find maximum likelihood parameters λ ofobserving each trace xm, restarting the algorithm whenever the log-likelihood hasmultiple-local maxima. The result is an admixture bigram model: a Θ-weightedmixture of the K DTMCs Φk. The model is bigram because only dependenciesbetween adjacent symbols in the trace are considered.

3 Case Study: AppTracker

AppTracker is an iOS application that provides a user with information on theusage of their device. It operates on iPhones/iPads/iPods, running in the back-ground and monitoring the opening and closing of apps as well as the lockingand unlocking of the device. It was released in August 2013 and downloadedover 35,000 times. The interface displays a series of charts and statistics to giveinsight into how long one is spending on their device, the most used apps, howthese stats fluctuate over time, etc.. Figure 1 shows three views from the app.The main menu screen offers four main options (Fig. 1(a)). The first menu item,Overall Usage, contains quick summaries of all the data recorded since App-Tracker was installed opening the views TopApps and Stats (Fig. 1(b)). Thesecond menu item, Last 7 Days, displays a chart limited to the activity recordedover the last 7 days. The third menu item, Select by Period, shows statistics for aselected period of time. For example, one could investigate which apps one usedthe most last Saturday, see how the time one spent on Facebook varied each dayacross last month, or examine patterns of use over a particular day (Fig. 1(c)).The final menu option, Settings, allows a user to start and stop the tracker, orto reset their recorded data. A Terms and Conditions screen is shown to a useron first launch that describes all the data that will be recorded during its useand provides contact details to allow the user to opt out at any time.Preparing raw logged data. Data is collected within the SGLog frame-work [8]. Each log, stored in a MySQL database, contains information about theuser, the device, and the event that took place. For our analysis, we are interestedin the events resulting in a switch between views within the app. The raw data isextracted and processed using JavaScript to obtain user traces of views (for eachuser). A special view denotes when the user leaves the app (UseStop) and we

5

(a) Main menu (b) Overall stats (c) Device usage for oneday

Fig. 1. Screenshots from AppTracker

2:TopApps

14:Task 11:Feedback

8:Stats 1:Main 4:SelectorPeriod

9:UsageBarChartTopApps

6:Settings 13:Info0:TermsAndConditions

3:Last7Days10:UsageBarChartStats 12:UsageBarChartApps7:UseStop

5:AppsInPeriod

Fig. 2. AppTracker state diagram

define a session as the event sequence delimited by two UseStop states, exceptthe initial session which starts from the TermsAndConditions. This results in atotal of 15 unique views, with transitions, illustrated in Fig. 2 with the followingmeaning: (0) TermsAndConditions is the terms and conditions page; (1) Main isthe main menu screen; (2) TopApps shows the summary of all recorded data; (3)Last7Days shows the last 7 days of top 5 apps used; (4) SelectPeriod shows appusage stats for a selected time period; (5) AppsInPeriod shows apps used for aselected period; (6) Settings shows the settings options; (7) UseStop stands forclosing/sending to background the AppTracker; (8) Stats shows statistics of appusage; (9) UsageBarChartTopApps shows app usage when picked from TopApps;(10) UsageBarChartStats shows app usage when picked from Stats; (11) Feedbackshows a screen for giving feedback; (12) UsageBarChartApps shows app usagewhen picked from AppsInPeriod; (13) Info shows information about the app; (14)Task shows a feedback question chosen from the Feedback view. The 15 viewsrelate directly to the underlying atomic propositions used later in the DTMCsand we map user traces to 15× 15 transition-occurrence matrices.

6

Fig. 3. The DTMCs of the activity patterns for K = 4 and first month of usage.

Data for this study. All data was gathered between August 2013 and May2014, from 489 users. The average time spent within the app per user is 626s(median 293s), the average number times going into the app is 10.7 (median 7),the average user trace length is 73.6 view transitions (median 46). We segmentthe user traces into time cuts of the interval form [d1, d2), which includes theuser traces from the d1-th up to the d2-th day of usage.

4 Inferring Admixture Bigram Models

For each chosen value of K and time cut of the logged data we obtain K DTMCswith 15 × 15 transition matrices called activity patterns and an M ×K matrixΘ, where M is the number of user traces, and with each row a distribution overthe K activity patterns. For each activity pattern APk , for k = 1, . . . ,K, wegenerate automatically a PRISM model with one variable x for the views ofthe app with values ranging from 0 to 14. For each state value of x we have aPRISM command defining all possible 15 probabilistic transitions where Φkij isthe transition probability from state x = i to the updated state x′ = j in activitypattern APk, for all i, j = 0, . . . , 14. For each state value we associate the labelcorresponding to a higher level state in AppTracker (see the mapping in Fig. 2)as well as a reward structure which assigns a reward of 1 to that state. ThePRISM file for each activity pattern also includes a reward structure assigninga rewards of 1 to each transition (or time step) in the DTMC. All our PRISMmodels have at most 15 states and at most 51 transitions.

We implemented the EM algorithm in Java, applying the algorithm to datasets with 100 iterations maximum and 200 restarts maximum. Running the EMalgorithm takes about 119s for K = 2, 162s for K = 3, and 206s for K = 4 ona 2.8GHz Intel Xeon. Timings are obtained by running the algorithm 90 times.The algorithm is single threaded and runs on one core.

As example, Figure 3 illustrates state-transition diagrams of all the K = 4activity patterns, the thickness of the transitions corresponding to ranges ofprobability: the thicker the line, the higher the probability of that transition.Note this illustration does not include the distribution over the activity patterns.

7

Table 1. rPCTL properties Prop1–Prop5

ID Formula and informal description

Prop1 P=?[ ! `U≤N ` ] : Probability of visiting a `-labelled state for the first time fromthe initial state within N time steps

Prop2 R{”r `”}=?[C≤N ] : Expected number of visits to a `-labelled state from theinitial state within N time steps

Prop3 R{”r Steps”}=?[F `] : Expected number of time steps to reach a `-labelledstate from the initial state

Prop4 filter(state,R{”r Steps”}=?[F `1], `2) : Expected number of time steps toreach a `1-labelled state labelled from a `2-labelled state

Prop5 filter(state,P=?[((! `1)&(! ”UseStop”)) U≤N `1], `2) : Probability of reachingfor the first time a `1-labelled state from a `2-labelled state during a session

5 Analysing rPCTL Properties

We have found that most patterns for logic properties (e.g. probabilistic response,probabilistic precedence, etc.) relate to the design of reactive systems and arenot generally helpful for evaluation of user-intensive apps. However, a study ofwhich patterns would be useful is beyond the scope of this paper. Table 1 liststhe rPCTL properties we used, with state labels `, `1, `2. Prop1, Prop2, andProp3 are the properties we investigated initially; Prop4 and Prop5 were iden-tified later, prompted by designers’ hypotheses and inconclusive initial results.Prop4 generalises Prop3 by analysing traces starting with a chosen state, notnecessarily the initial one.

We inferred models forK ∈ {2, 3, 4} for various time cuts and performed anal-ysis of rPCTL properties Prop1 – Prop5 on all activity patterns. For brevity,here we show only properties concerning the states: TopApps, Stats, SelectPeriod,Last7Days, UseStop. These five states showed significant results and differencesacross time cuts and temporal properties and the designers showed particularinterest in them when formulating hypotheses about the actual app usage.

We adopt the following interpretations of model checking results for Prop1,Prop2, and Prop3 in our case study for the same value of N . We say that apair of state and activity pattern (`, APi) scores a better (resp. worse) resultthan (`′, APj), for all 1 ≤ i 6= j ≤ K, where either ` 6= `′ or i 6= j, if: Prop1returns a higher (resp. lower) value, Prop2 a higher (resp. lower) value, andProp3 a positive lower (resp. higher) value for (`, APi) than for (`′, APj).

Analysing rPCTL properties for K = 2. We verify Prop1, Prop2, andProp3 on the two activity patterns AP1 and AP2 for six time cuts: first day[0, 1), first week minus the first day [1, 7), the first month minus the first week[7, 30), the first month [0, 30), the second month [30, 60) and the third month[60, 90), and for N ranging from 10 to 150 with step-size 10. Here we only showthe results for N = 50 in Table 2. The best results with respect to the property

8

Table 2. Prop1 (the probability of reaching a given state for the first time withinN steps), Prop2 (the expected number of visits to a given state within N steps),and Prop3 (the expected number of time steps to reach a given state) checked fordifferent states and time cuts, and for N = 50 steps. The best results with respect tothe property checked across the two patterns are in bold font.

Property Time TopApps Stats SelectPeriod Last7Days UseStop

cut AP1 AP2 AP1 AP2 AP1 AP2 AP1 AP2 AP1 AP2

Prop1 [0, 1) 0.99 0.99 0.99 0.83 0.47 0.79 0.49 0.96 0.99 0.99

[1, 7) 0.99 0.99 0.98 0.80 0 0.93 0 0.98 0.99 0.99

[7, 30) 0.99 0.99 0.99 0.64 0.01 0.94 0.84 0.96 0.99 0.99

[0, 30) 0.99 0.99 0.99 0.75 0.21 0.92 0.44 0.98 0.99 0.99

[30, 60) 0.99 0.99 0 0.90 0.73 0.83 0.56 0.98 1 0.99

[60, 90) 1 0.95 0.96 0.72 0 0.94 0 0.97 1 0.99

Prop2 [0, 1) 13.94 7.44 7.63 2.15 0.79 1.82 0.70 3.13 11.41 6.17

[1, 7) 17.22 5.77 4.00 2.31 0 3.97 0 4.03 12.91 6.30

[7, 30) 14.93 7.15 5.43 1.47 0.01 4.61 1.78 3.41 12.86 5.74

[0, 30) 14.67 6.48 5.08 1.90 0.24 3.58 0.58 3.99 11.00 6.51

[30, 60) 13.40 6.83 0 3.76 4.41 2.04 0.85 4.54 12.46 5.61

[60, 90) 17.30 5.83 2.94 2.60 0 3.26 0 4.43 13.96 5.63

Prop3 [0, 1) 3.31 8.41 8.18 28.67 79.32 32.46 74.87 15.56 4.86 7.88

[1, 7) 2.05 10.70 12.44 31.90 ∞ 19.12 ∞ 12.38 3.85 7.55

[7, 30) 2.52 9.68 9.70 48.61 ∞ 17.78 26.61 14.58 3.88 8.44

[0, 30) 3.05 9.73 11.01 36.03 209.68 19.94 87.54 12.19 4.67 7.43

[30, 60) 4.04 10.34 ∞ 22.33 38.21 28.28 61.74 11.08 1 8.82

[60, 90) 2.02 15.28 16.53 39.68 ∞ 17.41 ∞ 11.56 3.57 8.90

checked across the two patterns are in bold font: we can easily see that thetwo patterns correspond to different behaviours (results) with respect to thefive states to be more likely, more often, and more quickly reached. Note thatlooking at the analysis results for UseStop, on average we see twice as manysessions under AP1 than under AP2 and the average session length in terms oftime steps under AP2 is double the average session length under AP1.

If we overlook (for now) the results for the time cut [30, 60), we concludethat there are two distinct activity patterns:

AP1: Overall Viewing pattern corresponds to more likely, more often, andmore quickly to reach TopApps and Stats, thus more higher level stats visu-alisations and shorter sessions.

AP2: In-depth Viewing pattern corresponds to more likely, more often, andmore quickly to reach Last7Days and SelectPeriod, thus more in-depth statsvisualisations and longer sessions.

Now considering the time cut [30, 60), on Table 2 we note slightly different resultsfor this time cut compared to the more consistent results for the other five timecuts: a high number of visits to and a relative low number of time steps to reachStats no longer belongs to AP1, but to AP2; Prop1 and Prop2 for SelectPeriodno longer discriminate clearly between AP1 and AP2 due to very close results.As a consequence we analyse the additional rPCTL properties (see Table 3)

9

Table 3. Properties Prop4 (expected number of time steps to reach a row statefrom a column state for each pattern) and Prop5 (probability of reaching for the firsttime a row state from a column state during a session for each pattern) verified forK = 2, time cut [30, 60). Blue-coloured, bold font results are characteristic tothe Overall Viewing pattern, while red-coloured font results to the In-depth Viewingpattern; default text colour means inconclusive.

Prop4 Pattern

TopA

pps

Last

7D

ays

Sel

ectP

erio

d

Main

TopApps AP1 – 3.62 11.09 2.22

AP2 – 10.56 14.89 9.347

Last7Days AP1 61.83 – 68.68 59.56

AP2 14.01 – 15.61 10.08

SelectPeriod AP1 38.30 36.69 – 36.03

AP2 31.21 28.77 – 27.28

UseStop AP1 1.49 3.61 9.23 3.33

AP2 11.74 6.24 11.51 7.82

Prop5 Pattern

TopA

pps

Last

7D

ays

Sel

ectP

erio

d

TopApps AP1 – 0.66 0.26

AP2 – 0.26 0.30

Last7Days AP1 0.006 – 0.02

AP2 0.49 – 0.38

SelectPeriod AP1 0.008 0.08 –

AP2 0.23 0.15 –

for the time cut [30, 60). We colour-code the results to correspond to the OverallViewing pattern in blue-coloured, bold font and to the In-depth Viewing in red-coloured font. Except for two inconclusive pairs of results (default text colour),Table 3 tells us that the two activity patterns learned from the time cut [30, 60)are respectively similar to the two activity patterns identified for the rest of timecuts analysed previously. The difference in the behaviour around Stats could beexplained by a new usage behaviour of the AppTracker around the 30th dayof usage due to approximately a full month worth of new statistics, leading toa spurt of more exploratory usage of AppTracker. We note that for the timecut [60, 90) the results listed in Table 2 make again a clear distinction betweenthe two activity patterns with respect to the states SelectPeriod and Last7Days.We might say that in the third month the exploratory usage of AppTrackersettles down and users know exactly what to look for and where. A finer-grainedlongitudinal analysis based on one-week time cuts could reveal additional insightinto the behaviour involving Stats around the 30th day of usage.

Our conclusion concerning the two types of activity patterns meets the devel-opers’ hypothesis about two distinct usages of the apps. However they expectedalso to see one pattern revolving around TopApps and Stats, one around Se-lectPeriod and another one around Last7Days. Since we analysed the admixturemodel for K = 2, we only got two distinct patterns, the last two patterns con-jectured by developers being aggregated into a single one. As a consequence, weinvestigate higher values for K.

Analysing rPCTL properties for K = 3. We analyse Prop1, Prop2, andProp3 on the admixture model inferred for K = 3, time cut [0, 30) and N = 50.For brevity, we omit the details, and based on the results we characterise thethree patterns as follows:

10

– AP1 is an Overall Viewing pattern because TopApps and Stats have bestresults for all three properties. SelectPeriod and Last7Days are absent. Thesessions are twice as short and twice more frequent than for the In-depthViewing pattern.

– AP2 is a ’weaker’ Overall Viewing pattern than AP1 because TopApps hasworse results, and better results than Stats and Last7Days; SelectPeriod isabsent.

– AP3 is an In-depth Viewing pattern because SelectPeriod has the best results,followed closely by TopApps and Last7Days.

Analysing rPCTL properties for K = 4. We analyse Prop1, Prop2, andProp3 on the admixture model inferred for K = 4, time cut [0, 30) and forN = 50. Again, details are omitted and we characterise the patterns as follows:

– AP1 is mainly a TopApps Viewing activity pattern because it has the bestresults for TopApps, compared to Stats, SelectPeriod, and Last7Days whichscore very low results.

– AP2 is a Stats – TopApps Viewing activity pattern, with very low resultsfrom Last7Days; SelectPeriod is absent.

– AP3 is an In-depth Viewing pattern with dominant Last7Days followed closelyby TopApps and SelectPeriod; Stats is absent.

– AP4 is mainly a TopApps Viewing pattern because TopApps has the bestresults, while all other states need on average an infinite number of timesteps to be reached. The fact that it takes on average an infinite number oftime steps to reach the end of a session (i.e., the state UseStop) motivated usto analyse this pattern with other temporal properties and for other states.As a consequence we saw that UsageBarChartTopApps has similar propertiesas TopApps, meaning that this pattern corresponds to repeatedly switchingbetween TopApps and UsageBarChartTopApps.

Based on the results obtained for UseStop we observe: twice as many sessionsfor AP1 than for AP2 and AP3, only a couple of sessions on average for AP4,fewer views per session (i.e., shorter sessions) for AP1 than for AP2 and AP3.Longitudinal Θ-based comparison. In addition to analysing rPCTL prop-erties, we also compare how the distribution Θ of the two activity patterns forthe entire population of users changes in time. For each time cut considered forthe rPCTL analysis above and activity pattern AP2, we order non-decreasinglythe second column of Θ and re-scale its size to the interval [0, 1] to represent thehorizontal axis, while the ordered Θ values are projected on the vertical axis.Figure 4 shows the Θ values for AP1 and AP2 for the population of users acrossthe first three months of usage. We conclude that during the first day of usage,up to 40% of users exhibit exclusive In-depth Viewing behaviour (probabilityclose to 1 on the y-axis) corresponding to an initial exploration of the app withsignificant number of visits to TopApps, Stats, SelectPeriod, and Last7Days. Also,at most 10% of the users exhibit exclusive Overall Viewing behaviour maybe be-cause they feel less adventurous in exploring the app, preferring mostly the firstmenu option of looking at TopApps and subsequently at Stats. We note that the

11

(a) Overall Viewing AP (b) In-depth Viewing AP

Fig. 4. Longitudinal comparison of the activity pattern distributions Θ over the pop-ulation of users for K = 2 and time cuts [0, 1), [1, 7), [7, 30), [30, 60), [60, 90)

(a) K = 2 (b) K = 3 (c) K = 4

Fig. 5. Pattern distributions for K = 2 (Overall Viewing, In-depth Viewing), K = 3(Overall Viewing, weak Overall Viewing, In-depth Viewing,) and K = 4 (mainly TopAppsViewing, equally Stats and TopApps Viewing, In-depth Viewing with no Stats, exclusiveTopApps and UsageBarChartTopApps), time cut [0, 30).

distributions of the two activity patterns in the population of users are similarfor the time cuts [0, 1) and [30, 60) – probably because more users exhibit a moreexploratory behaviour during these times (new types of usage statistics becomeavailable after one month of usage). At the same time, the plots for the time cuts[1, 7), [7, 30), and [60, 90) are also similar, and we think that they correspond toa settled (or routine) usage behaviour.

Structural Θ-based comparison. In Fig. 5 we plot the weightings of all usersfor each activity patterns for K ∈ {2, 3, 4} and the time cut [0, 30). Figure 5(a)tells us that for K = 2 the In-depth Viewing has higher weightings across theuser population with almost 25% of the users using the app exclusively like this,hence either exploring the app or genuinely interested in in-depth usage statis-tics. Figure 5(b) tells us that almost 10% of the users are exclusively interestedin TopApps, Stats and Last7Days but not SelectPeriod; this behaviour is the mostpopular among users. From Fig. 5(c) we see that almost 50% of the users do notbehave according to AP4 – switching repeatedly between TopApps and Usage-BarChartTopApps. Note that for K = 3 and K = 4 no pattern stands out as verydifferent from the others.

12

6 Formal Analysis Informs Redesign

We now consider how our results provide insights for redesign, in the contextof the case study. Our initial analysis uncovered two activity patterns, charac-terised by the type of usage stats the user is examining: Overall Viewing – morehigh-level usage statistics for the entire recorded period, or In-depth Viewing –more in-depth usage statistics for specific periods of interest. Neither is signifi-cantly dominant over the other: for the majority of users, usage is fairly evenlydistributed between the two patterns. This suggests that a revised version ofAppTracker should continue to support both patterns.

We note the two patterns identified for K = 2 correspond closely to optionspresented on AppTracker’s main menu (see Fig. 1(a)), as follows. Overall Viewingindicates a greater likelihood of using TopApps and Stats, which are interfacescreens accessed through the Overall Usage menu item. In-depth Viewing indicatesa greater likelihood of reaching SelectPeriod and Last7Days, which are accessedthrough Select by Period and Last 7 Days, but also some usage of TopAppsand Stats. Our results indicate that sessions corresponding to Overall Viewingare generally shorter: meaning that users are performing fewer actions betweenlaunching AppTracker and exiting back to the device’s home screen. These twodifferent patterns suggest that, in a future version of AppTracker, if developerswant to keep the two major styles of usage separated between different screens,they could design explicitly for the glancing-like short interactions in OverallUsage and longer interactions in a new Select by Period screen along with theinitial Last 7 Days screen. Also more filtering and querying tools could be addedto Select by Period.

We wondered if users are simply following the paths suggested by the mainmenu (Fig. 1(a)). We therefore probed further, considering K ∈ {3, 4, 5} (detailsare omitted for K = 5). For K = 3, if the analysis was merely mirroring themenu structure, we might expect to see one pattern centred around each of thefirst three main menu items. Although we see the pattern AP2 centred aroundTopApps, Stats, and Last7Days but no SelectPeriod, we do not see a patterncentred around SelectPeriod and not including Last7Days. For K = 4 we findLast7Days and SelectPeriod together in a pattern, with the former view slightlymore popular than latter one; this combination also occurred for K = 2 andK = 5. For K = 4 we see a distinct new pattern showing users repeatedly switch-ing between TopApps and UsageBarChartTopApps. TopApps is an ordered list ofthe user’s most used apps; selecting an item from this list opens UsageBarChart-TopApps, a bar chart showing daily minutes of use of this app. This persistentswitching suggests a more investigatory behaviour, which is more likely to beassociated with the In-depth Viewing. Yet this behaviour is occurring under theOverall Usage menu item, which we hypothesised and then identified as beingassociated with more glancing-like behaviour. This suggests that our results areproviding more nuanced findings than simple uncovering of existing menu struc-ture. We therefore suggest that if developers want to separate the two types ofusage between different menu items even more, they could move the TopApps –UsageBarChartTopApps loop from Overall Usage to Select by Period.

13

Glancing activity patterns. Discovering a glancing activity pattern providessignificant benefits for app redesign. Since the release of the iOS 8 SDK in 2014,Apple has allowed the development of ‘Today widgets’ – extensions to apps com-prising small visual displays and limited functionality appearing in the Noti-fication Centre. Beyond the advice from Apple’s Human Interface Guidelines1,developers struggle to decide which pieces of their app’s contents would best suitinclusion in a Today widget: few conventions have built since the release. Devel-opers have to rely on their own judgement to select appropriate content fromtheir app to populate this view. In our analysis, we have uncovered explicitly thespecific screens that people look at when they are undertaking short sessions ofglancing-type behaviour, i.e. the typical glancing patterns for AppTracker – theOverall Viewing pattern and the TopApps-centred patterns. In identifying suchactivity patterns, our approach provides a more principled method of selectingcontent appropriate for an app extension such as a Today widget.

7 Discussion

Related Work. Logging software is frequently used to understand programbehaviours, and typically to aid program comprehension – building an under-standing of how the program executes [9]. There are various techniques thatuse logs of running software, such as visualising logs (e.g. [10]) and capture andreplay (e.g. [11]), with the aims of failure analysis, evaluating performance, andto better understand the system behaviour (as it executes). In contrast, we areinterested in ways users interact with software, and we do so by analysing logscaptured during actual use. The difference is important. In the case of programcomprehension, log analysis is used to understand better what is going on withinthe code and how the artefact is engineered (in order to be better prepared forimprovements and maintenance). In our case, log analysis provides insights aboutdistinct styles of use and informs improvements of the high-level design. For ex-ample, in [12] the authors infer FSMs for modelling a system’s behaviour, whilewe infer DTMC of different actual usage behaviours, and admixtures thereof, tomodel populations of users. There is complementarity with the approach of [13],which employs usage logs and applies temporal logical analysis, but a key differ-ence is their models are based on static user attributes (e.g. city location of user)rather than on inferred behaviours. Their approach assumes within-class use tobe homogeneous, whereas our research demonstrates within-class variation.

Methodological issues. Our approach is a collaboration between developersfamiliar with app development and instrumentation, and analysts familiar withstatistics and formal modelling. We note some methodological issues.

First, our approach gains from having significant volumes of log data to workon, for reliable application of statistical methods. However, neither the volumeof log data nor number of users is relevant for the probabilistic model checking,

1 https://developer.apple.com/library/ios/documentation/UserExperience/

Conceptual/MobileHIG/AppExtensions.html

14

only the number of higher level states for analysis selected from the raw datadetermines the complexity. Therefore our approach scales because it does notdepend on the number of activity patterns or the number of users/data size, buton the state space of the abstract model of the app. The AppTracker case studyinvolved a vocabulary of 15 views/states, which was comfortably manageable,but our approach would have difficulty with very large vocabularies.

Second, the issue of what to log is not trivial. The collection of log entries caninclude anything from a button press, to the change of WiFi signal of the device,and so on. In the case of AppTracker, we decided to use states corresponding toindividual screens possible to transition to within the app. This highlights therather simple nature of AppTracker – it is essentially a browser of information.In contrast, a game such as Angry Birds allows the user to perform a much morecomplex set of actions. Even after pruning the logs to include only user actions(rather than lower level device events), one still needs to decide how to modelthese actions as a state space. The chosen state space will ultimately influencewhat activity patterns become prominent. We therefore suggest that discussionand preliminary analysis be done early in the development process, so that thedecisions about what to log and what the state space should be are made bydevelopers and analysts jointly in a well-informed way.

Third, the activity patterns and their number (i.e. K) are key to analysis. Thepatterns are inferred by various standard statistical methods based on non-linearoptimisation. We do not model for predictability, there is no true model of thegenerating process, but one that is posited based on known characteristics such asthe sequential nature of issuing app events. We study the time-series behaviourthat has been logged from a probabilistic perspective. The admixture modelis important because we are defining a complex behavioural trait where theindividual moves between patterns during an observed user trace. The numberof patterns K is an important exploratory tool, there is no optimal value for K.

8 Conclusions and Future Work

We have outlined an approach to exploring and gaining insight into usage pat-terns that informs redesign based on probabilistic formal analysis of actual appusage. Our approach is a combination of bottom up statistical inference fromuser traces, and top down probabilistic temporal logic analysis of inferred mod-els. We have illustrated this via the mobile app AppTracker, and discussed howthe results of this analysis inform redesign that is grounded in existing patternsof usage. A notable conclusion of our work is that, while our analysis of App-Tracker’s use identifies several clearly distinct activity patterns, it also revealsthe distribution of activity patterns over the population of users and over time.For AppTracker, this mitigates against a simple partitioning of the app into twodifferent versions, each specific to one activity pattern. In addition, our analysisoffers a more principled way of selecting glanceable information.

AppTracker developers are currently implementing a redesign based on ouranalysis, and we eagerly await new data sets of logged behaviours for further

15

analysis. Future work will involve that analysis, as well as patterns for logicproperties and generalisation of our approach to a principled way of providingsoftware redesign guidelines as part of a user-centered design process.

Acknowledgements. This research is supported by the EPSRC ProgrammeGrant A Population Approach to Ubicomp System Design (EP/J007617/1).

References

1. Baier, C., Katoen, J.P.: Principles of Model Checking. The MIT Press (2008)2. Bell, M., Chalmers, M., Fontaine, L., Higgs, M., Morrison, A., Rooksby, J., Rost,

M., Sherwood, S.: Experiences in Logging Everyday App Use. In: Proc. of DigitalEconomy’13, ACM (2013)

3. Girolami, M., Kaban, A.: Simplicial Mixtures of Markov Chains: Distributed Mod-elling of Dynamic User Profiles. In Thrun, S., Saul, L.K., Scholkopf, B., eds.: Ad-vances in Neural Information Processing Systems 16 (NIPS’03), MIT Press (2004)9–16

4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from IncompleteData via the EM Algorithm. Journal of the Royal Statistical Society. Series B(Methodological) 39(1) (1977) 1–38

5. Kwiatkowska, M.Z., Norman, G., Parker, D.: Stochastic Model Checking. InBernardo, M., Hillston, J., eds.: SFM. Volume 4486 of LNCS., Springer (2007)220–270

6. Andrei, O., Calder, M., Higgs, M., Girolami, M.: Probabilistic Model Checking ofDTMC Models of User Activity Patterns. In: Proc. of QEST’14. Volume 8657 ofLNCS., Springer (2014) 138–153

7. Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: Verification of Prob-abilistic Real-Time Systems. In Gopalakrishnan, G., Qadeer, S., eds.: Proc. ofCAV’11. Volume 6806 of LNCS., Springer (2011) 585–591

8. Hall, M., Bell, M., Morrison, A., Reeves, S., Sherwood, S., Chalmers, M.: Adaptingubicomp software and its evaluation. In Graham, T.C.N., Calvary, G., Gray, P.D.,eds.: Proc. of EICS’09, ACM (2009) 143–148

9. von Mayrhauser, A., Vans, A.M.: Program comprehension during software main-tenance and evolution. IEEE Computer 28(8) (1995) 44–55

10. Fittkau, F., Waller, J., Wulf, C., Hasselbring, W.: Live trace visualization forcomprehending large software landscapes: The ExplorViz approach. In Telea, A.,Kerren, A., Marcus, A., eds.: Proc. of VISSOFT’13. (2013) 1–4

11. Gomez, L., Neamtiu, I., Azim, T., Millstein, T.D.: RERAN: Timing- and Touch-sensitive Record and Replay for Android. In Notkin, D., Cheng, B.H.C., Pohl, K.,eds.: Proc. of ICSE’13, IEEE / ACM (2013) 72–81

12. Beschastnikh, I., Brun, Y., Ernst, M.D., Krishnamurthy, A.: Inferring models ofconcurrent systems from logs of their behavior with CSight. In Jalote, P., Briand,L.C., van der Hoek, A., eds.: Proc. of ICSE ’14, Hyderabad, India, ACM (2014)468–479

13. Ghezzi, C., Pezze, M., Sama, M., Tamburrelli, G.: Mining Behavior Models fromUser-Intensive Web Applications. In: Proc. of ICSE’14, Hyderabad, India, ACM(2014) 277–287

Probabilistic Formal Analysis of App Usage to Inform Redesign · with a case study of AppTracker...

Documents

Transcript of Probabilistic Formal Analysis of App Usage to Inform Redesign · with a case study of AppTracker...