Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf ·...

14
Interactive Rule Refinement for Fraud Detection Tova Milo 1 Slava Novgorodov 1 Wang-Chiew Tan 2 1 Tel-Aviv University 2 UC Santa Cruz 1 {milo, slavanov}@post.tau.ac.il 2 [email protected] ABSTRACT Credit card frauds are unauthorized transactions that are made or attempted by a person or an organization that is not authorized by the card holders. Fraud with general-purpose cards (credit, debit cards etc.) is a billion dollar industry and companies are therefore investing significant efforts in identifying and preventing them. In addition to machine learning-based techniques, credit card companies often employ domain experts to manually specify rules that exploit general or domain knowledge for improving the de- tection process. Over time, however, as new (fraudulent and le- gitimate) transaction arrive, these rules need to be updated and re- fined to capture the evolving (fraud and legitimate) activity pat- terns. The goal of the RUDOLF system described in this paper is to guide and assist domain experts in this challenging task. RUDOLF automatically determines the “best” adaptation to existing rules to capture all fraudulent transactions and, respectively, omit all legit- imate transactions. The proposed modifications can then be fur- ther refined by users and the process can be repeated until they are satisfied with the resulting rules. We show that the problem of identifying the best candidate adaptation is NP-hard in general and present PTIME heuristic algorithms for determining the set of rules to adapt. We have implemented our algorithms in RUDOLF and show, through experiments on real-life datasets, the effective- ness and efficiency of our solution. In our experiments, we also demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining rules for preventing network attacks. 1. INTRODUCTION A credit card fraud is an unauthorized transaction (or third-party fraud), which is defined as a transaction that is made or attempted by an individual or organization who is not authorized by the card holder to use a credit card to perform the electronic payment. Fraud with general-purpose cards (credit, debit cards etc.) is a billion dol- lar industry. In fact, several independent news articles and studies that were carried out (e.g., [5, 20]) corroborate that there is a con- sistent, fast-growing, and upward trend on the total global payment- card frauds. Detecting and deterring credit card frauds are therefore of extreme importance to credit card companies. In this paper, we present RUDOLF, a system that is capable of assisting users to define and refine rules for fraud detection. The system is motivated by a real problem that is faced by a credit card company whose name we cannot disclose. Everyday, the company receives new transactions that are made through credit cards that are issued by the company. A core part of operations behind the credit card company is to detect fraudulent transactions among the new transactions. Naturally, the credit card company desires an au- tomated way to identify all fraudulent transactions in the database. To this end, models based on machine-learning techniques have been developed to score each transaction and transactions whose scores are above a threshold are classified as fraudulent. However, the models and scoring system are never always precise. There are always some fraudulent transactions that are missed by the models and, likewise, there are always some legitimate transactions that are wrongly identified as fraudulent. For this reason, the credit card companies do not rely entirely on automatically inferred models to determine fraudulent transactions. In addition, they also rely on rules that are manually written and curated by domain experts. Intuitively, a rule captures a set of transactions in the database and the goal is to arrive at a set of rules that, together with the au- tomatically derived scores, captures precisely the fraudulent trans- actions. The use of rules written by users has the advantage that it allows employing general or domain knowledge to handle rare spe- cial cases. Furthermore, users can also leverage domain knowledge to detect frauds even before they are fully manifested in the trans- actions. For example, the credit card company may be informed of potential credit card frauds from a certain IP in Latvia in the com- ing week. Hence, in this case, the users can proactively write rules ahead of time to capture those transactions from Latvia. Similarly, a user may see evidence for fraud purchases in a certain store, and knowing that such frauds often propagate to neighborhood stores, write rules ahead of time to protect surrounding businesses. Writing rules to capture precisely fraudulent transactions is a challenging task that is exacerbated over time as the types of fraud- ulent transactions evolve or as new knowledge is learnt. Typically, a set of rules that were curated by users already exists and the rules work well for capturing fraudulent transactions up to a certain day. However, these rules need to be adapted over time to capture new types of frauds that may occur. For example, there may be new re- ported fraudulent transactions coming from a certain type of store at a certain time that did not occur before and hence, not caught by existing rules. Analogously, there may be some transactions that were identified by the existing rules as fraudulent but later verified by the card holders to be legitimate. Hence, the rules have to be adapted or augmented over time to capture all (but only) fraudulent transactions. The goal of RUDOLF, the system that we describe in this paper, is to assist users in achieving this challenging objective. 1

Transcript of Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf ·...

Page 1: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

Interactive Rule Refinement for Fraud Detection

Tova Milo 1 Slava Novgorodov 1 Wang-Chiew Tan 2

1Tel-Aviv University 2UC Santa Cruz1 milo, [email protected] 2 [email protected]

ABSTRACTCredit card frauds are unauthorized transactions that are made orattempted by a person or an organization that is not authorized bythe card holders. Fraud with general-purpose cards (credit, debitcards etc.) is a billion dollar industry and companies are thereforeinvesting significant efforts in identifying and preventing them.

In addition to machine learning-based techniques, credit cardcompanies often employ domain experts to manually specify rulesthat exploit general or domain knowledge for improving the de-tection process. Over time, however, as new (fraudulent and le-gitimate) transaction arrive, these rules need to be updated and re-fined to capture the evolving (fraud and legitimate) activity pat-terns. The goal of the RUDOLF system described in this paper is toguide and assist domain experts in this challenging task. RUDOLFautomatically determines the “best” adaptation to existing rules tocapture all fraudulent transactions and, respectively, omit all legit-imate transactions. The proposed modifications can then be fur-ther refined by users and the process can be repeated until theyare satisfied with the resulting rules. We show that the problemof identifying the best candidate adaptation is NP-hard in generaland present PTIME heuristic algorithms for determining the set ofrules to adapt. We have implemented our algorithms in RUDOLFand show, through experiments on real-life datasets, the effective-ness and efficiency of our solution. In our experiments, we alsodemonstrate the generality of RUDOLF by applying it to other frauddomains, such as refining rules for preventing network attacks.

1. INTRODUCTIONA credit card fraud is an unauthorized transaction (or third-party

fraud), which is defined as a transaction that is made or attemptedby an individual or organization who is not authorized by the cardholder to use a credit card to perform the electronic payment. Fraudwith general-purpose cards (credit, debit cards etc.) is a billion dol-lar industry. In fact, several independent news articles and studiesthat were carried out (e.g., [5, 20]) corroborate that there is a con-sistent, fast-growing, and upward trend on the total global payment-card frauds. Detecting and deterring credit card frauds are thereforeof extreme importance to credit card companies.

In this paper, we present RUDOLF, a system that is capable ofassisting users to define and refine rules for fraud detection. Thesystem is motivated by a real problem that is faced by a credit cardcompany whose name we cannot disclose. Everyday, the companyreceives new transactions that are made through credit cards thatare issued by the company. A core part of operations behind thecredit card company is to detect fraudulent transactions among thenew transactions. Naturally, the credit card company desires an au-tomated way to identify all fraudulent transactions in the database.

To this end, models based on machine-learning techniques havebeen developed to score each transaction and transactions whosescores are above a threshold are classified as fraudulent. However,the models and scoring system are never always precise. There arealways some fraudulent transactions that are missed by the modelsand, likewise, there are always some legitimate transactions that arewrongly identified as fraudulent. For this reason, the credit cardcompanies do not rely entirely on automatically inferred models todetermine fraudulent transactions. In addition, they also rely onrules that are manually written and curated by domain experts.

Intuitively, a rule captures a set of transactions in the databaseand the goal is to arrive at a set of rules that, together with the au-tomatically derived scores, captures precisely the fraudulent trans-actions. The use of rules written by users has the advantage that itallows employing general or domain knowledge to handle rare spe-cial cases. Furthermore, users can also leverage domain knowledgeto detect frauds even before they are fully manifested in the trans-actions. For example, the credit card company may be informed ofpotential credit card frauds from a certain IP in Latvia in the com-ing week. Hence, in this case, the users can proactively write rulesahead of time to capture those transactions from Latvia. Similarly,a user may see evidence for fraud purchases in a certain store, andknowing that such frauds often propagate to neighborhood stores,write rules ahead of time to protect surrounding businesses.

Writing rules to capture precisely fraudulent transactions is achallenging task that is exacerbated over time as the types of fraud-ulent transactions evolve or as new knowledge is learnt. Typically,a set of rules that were curated by users already exists and the ruleswork well for capturing fraudulent transactions up to a certain day.However, these rules need to be adapted over time to capture newtypes of frauds that may occur. For example, there may be new re-ported fraudulent transactions coming from a certain type of storeat a certain time that did not occur before and hence, not caught byexisting rules. Analogously, there may be some transactions thatwere identified by the existing rules as fraudulent but later verifiedby the card holders to be legitimate. Hence, the rules have to beadapted or augmented over time to capture all (but only) fraudulenttransactions. The goal of RUDOLF, the system that we describe inthis paper, is to assist users in achieving this challenging objective.

1

Page 2: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

Note that our goal resembles in part that of previous works onquery/rule refinement, which attempt to automatically identify min-imal changes to the query/rule in order to to insert or remove certainitems from the query result (e.g., [7]). However, a key differencehere is that such minimal modifications often do not capture theactual “ground truth”, namely the nature of ongoing attack, whichmay not yet be fully reflected in the data. By interacting with usersto fine-tune rules, important domain knowledge can be effectivelyimported into the rules to detect the pattern of frauds often evenbefore they are manifested in the transactions themselves.

As we shall describe, finding an optimal (to be formally defined)set of changes to the rules to capture fraudulent transactions andomit legitimate transactions is NP-hard in general. In light of this,RUDOLF employs greedy heuristics to assist a user in refining ruleswhich proceeds in two phases: it first makes the “best” suggestionson how the rules should be refined to capture fraudulent transac-tions before it makes further “best” suggestions on how the rulesshould be refined to avoid capturing legitimate transactions. In-stead of starting from scratch, the interactive and incremental rule-refinement framework of RUDOLF allows a user to understand andsystematically follow-up on the suggestions given by RUDOLF.

While most of our exposition on the features of RUDOLF is basedon credit card frauds, we emphasize that RUDOLF is a general-purpose system that can be used to interact with users to refinerules. For example, for detecting and preventing network attacks orfor refining rules for spam detection, or for intrusion detection [21].We will demonstrate in our experimental evaluation how RUDOLFcan be applied to refine rules to capture evolving network attacksover time.

An overview example The top of Figure 1 shows a simplified setof existing rules that is currently used to capture fraudulent trans-actions up to yesterday. Intuitively, the first two rules capture asuspicion of two attacks on an online store taking place at the firstand last few minutes of 6pm, charging amounts over $110. The lastrule captures a fraud pattern at Gas station A where false chargesof amounts over 40$ are made soon after the closing time at 9pm.In practice each rule also includes some threshold condition (notshown) on the score for each transaction, i.e. the degree of confi-dence that the transaction is fraudulent, as well as additional con-ditions on the user/settings/etc. We will omit the scores and theadditional conditions for simplicity and focus on these rules.

The bottom of Figure 1 shows an example of a relation whichcontains a number of transactions made today. The transactiontuples are ordered by the time of the transaction. In the figure,some transactions that were reported as fraudulent are labeled as“FRAUD”. Similarly, transactions that are reported to be legitimatemay be correspondingly labeled “LEGITIMATE” (not shown in thefigure). Transactions may also be unlabeled. The current set ofrules captures only the shaded tuples shown in the transaction rela-tion. Clearly, none of the new fraudulent transactions are capturedby the existing rules whereas some unlabeled transactions are cap-tured. Note that by slightly adjusting the time and amount intervalsand having the third rule cover also Gas station B, we could obtaina set of rules that capture the fraudulent transactions and avoid le-gitimate ones. However, such minimal changes often do not corre-spond to the best or correct changes. By interacting with RUDOLF,the domain experts can view/accept/reject/modify the suggestionsprovided by RUDOLF, arriving for instance at the following rules.

1) Time ∈ [18:05,18:05] ∧ Amt ≥ 100 ∧ Type = Onl., no CCV.2) Time ∈ [18:55,19:15] ∧ Amt ≥ 100 ∧ Type = Onl., no CCV.3) Time ∈ [20:45,21:15] ∧ Amt ≥ 40 ∧ Location ≤ Gas Station ∧Type ≤ Online.

Existing fraud rules Φ on day 𝑛:1) Time ∈ [18:00, 18:05] ∧ Amt ≥ 1102) Time ∈ [18:55, 19:00] ∧ Amt ≥ 1103) Time ∈ [21:00, 21:15] ∧ Amt ≥ 40 ∧ Location≤‘Gas Station A’

Time Amount Transaction Type Location18:02 107 Online, no CCV Online Store FRAUD18:03 106 Online, no CCV Online Store FRAUD18:04 112 Online, with CCV Online Store19:08 114 Online, no CCV Online Store FRAUD19:10 117 Online, with CCV Online Store20:53 46 Offline, without PIN GAS Station B FRAUD20:54 48 Offline, without PIN GAS Station B FRAUD20:55 44 Offline, without PIN GAS Station B FRAUD20:58 47 Offline, with PIN Supermarket21:01 49 Offline, with PIN GAS Station A

: : : : :

Figure 1: Rules from day 𝑛 and transaction relation on day 𝑛 + 1.

Intuitively, the updated rules convey the conclusion that the pre-vious first two rules reflect in fact a fraud attempt consisting onlyof online transactions without CCV, charging amounts over $100in slightly larger time interval. Similarly, the third rule reflectsthe conclusion that the gas station frauds that started at a singlestation further expand to gas stations in general via offline trans-actions around closing time of amounts over $40. (The condition“Location ≤ ‘Gas Station’ reflects the location semantic subsump-tion relation. In particular, Gas Station A and B are locations thatare contained in Gas Station. Similarly, “Type ≤ Online” reflectssubsumption relation on transaction types)

We explain in the paper how such rules that incorporate the users’knowledge are derived, and the role of RUDOLF in their efficientderivation. 2

Contributions This paper makes the following contributions.1. We formulate and present a novel interactive framework for de-

termining the “best” way to adapt and augment rules so thatfraudulent transactions are captured and, at the same time, le-gitimate transactions are avoided.

2. We establish the complexity of the general problem by firststudying two important special cases: (1) determine the bestway to adapt/augment existing rules to capture new fraudulenttransactions when there are no new legitimate transactions, and(2) determine the best way to adapt/augment existing rules toavoid capturing new legitimate transactions when there are nonew fraudulent transactions. We show that these problems areNP-hard in general even in these special settings.

3. In light of these hardness results, we develop heuristic algo-rithms for each of the special cases described above, then gen-eralize them to form a solution to the general problem. For thefirst special case, “minimal” changes to the rules that wouldcapture new, un-captured yet fraudulent transactions (even ifnot all) are greedily identified by our algorithm. The user canfurther fine-tune the proposed changes to make them corre-spond to the best changes that the user has in mind, or to seekother suggestions for changes. A similar scenario follows theother special case. We show how the two algorithms can becombined to solve the general problem.

4. We have implemented our solution in the RUDOLF prototypesystem and applied it on real use cases, demonstrating the ef-fectiveness and efficiency of our approach. We performed ex-perimental evaluations on a real-life dataset of credit card trans-actions and a publicly available dataset of network traffic log.

2

Page 3: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

All#

Online# Offline# With#code# No#code#

Online,#with#CCV#

Online,#without#CCV#

Offline,#with#PIN#

Offline,#without#PIN#

None#

Figure 2: Partial order for type values. ⊤type is denoted by “All”and ⊥type is denoted by “None”.

We show that by interacting with users (even ones with only lit-tle knowledge specific to the domain of the datasets), our algo-rithms consistently outperform alternative baseline algorithms,yielding more effective rules in shorter time.

The paper is organized as follows. The next two sections definethe model and problem statement behind RUDOLF (Section 2) and,respectively, (Section 3). The algorithm and subroutines behindRUDOLF are described in Section 4. We then present our experi-mental results (Section 5) and related work (Section 6), before wepresent our conclusions (Section 7).

2. PRELIMINARIESTransaction relation Our model consists of a relation, called thetransaction relation. A transaction relation is a set of tuples (ortransactions). The transaction relation is appended with more trans-actions over time. We assume that the domain of every attribute 𝐴has a partial order, which is reflexive, antisymmetric, and transitive,with a greatest element ⊤𝐴 and least element ⊥𝐴. W.l.o.g. we alsoassume that ⊥𝐴 does not appear in any of the tuples1. For brevity,when an attribute name is clear from the context we will omit it andsimply use the notations ⊤ and ⊥. Attributes that are associatedwith a partial order but not a total order are called categorical at-tributes. The elements in such partial order are sometimes referredto as concepts.

A transaction may be marked fraudulent which means that thetransaction was carried out illegally. Conversely, a transaction maybe verified to be legitimate and marked accordingly. Transactionswhich are neither marked fraudulent nor legitimate are called un-labeled transactions. The labeling is assumed to correspond to the(known part of the) ground truth. In addition, each transaction hasits score between 0 and 1, that is computed automatically usingmachine learning techniques, and depicts the estimated probabilityof each transaction to be fraud. The score may or may not agreewith the ground truth and this discrepancy is precisely the reasonwhy rules are employed to refine the fraud detection.

EXAMPLE 2.1. Continuing with our example from the Intro-duction, Figure 1 shows part of a transaction relation 𝐼 with schemaT(time,amount,type,location,...). Each tuple records,among others, the time, amount, type of transaction, and locationwhere the purchase was made through some credit card. The coresof the transactions, as computed by the machine learning module,are omitted in the figure. The last column annotates the transac-tions that are fraudulent, legitimate, or unlabeled. The part ofinstance 𝐼 that is shown contains only fraudulent and unlabeledtransactions.

The type time is discretized to minutes and is associated with adate (not shown). It thus has a partial order (in fact, a total order),1If this is not the case, add a new special element to the domain andset it smaller, in the partial order, than all other elements.

given by the ≤ relation. The type amount also has a total orderwith least element 0 and greatest element ∞. The attribute typeis a categorical attribute and its partial order is given by the hier-archy shown in Figure 2. Some examples of concepts in the hierar-chy are “Online” or “Offline, without PIN”. The type locationis also a categorical attribute and its partial order (not shown) isgiven by, say, the geographical containment relationship. In par-ticular, “Gas Station A” and “Gas Station B” are both conceptsthat are children of the concept “Gas Station”.

Rules Credit card experts specify a set of rules that can be ap-plied to a transaction relation to discover fraudulent transactions.For simplicity and efficiency of execution, the rules are typicallywritten over a single relation, which is a universal transaction rela-tion that includes all the necessary attributes (possibly aggregatedor derived from many other database relations) for fraud detection.Hence, it is not necessary to consider explicit joins over differentrelations in the rule language. Similarly, for simplicity, our rulelanguage allows only one condition over each attribute. Multipledisjunctive conditions over the same attribute can be expressed inmultiple rules. Other extensions to the rule language are possiblebut will not be considered here. Note that the rule language that weconsider, albeit simple, forms the core of common rule languagesused by actual systems.

A rule is a conjunction of one or more conditions over the at-tributes of the transaction relation. More precisely, a rule is of theform 𝛼1 ∧ ... ∧ 𝛼𝑛 where 𝑛 is the arity of the transaction relation,𝛼𝑖 is a condition is of the form ‘𝐴𝑖 op 𝑠’ or ‘𝐴𝑖 ∈ [𝑠, 𝑒]’, 𝐴𝑖 is the𝑖th attribute of the transaction relation, op ∈ =, <,>,≤,≥, and𝑠 and 𝑒 are constants.

More formally, if 𝜙 is a rule that is specified over a transactionrelation 𝐼 , then 𝜙(𝐼) denotes the set of all tuples in 𝐼 that satisfy𝜙. We say that 𝜙(𝐼) are the transactions in 𝐼 that are captured by𝜙. If Φ denotes a set of rules over 𝐼 , then Φ(𝐼) =

⋃𝜙∈Φ 𝜙(𝐼). In

other words, Φ(𝐼) denotes the union of results of evaluating everyrules in Φ over 𝐼 . Observe that Φ(𝐼) ⊆ 𝐼 since every rule selectsa subset of transactions from 𝐼 . For readability, in our exampleswe show only the non trivial conditions on attributes, namely omitconditions of the form 𝐴𝑖 ≤ ⊤.

EXAMPLE 2.2. The top of Figure 1 illustrates a set Φ of threesimplified rules currently used by the example credit card companyto detect fraudulent transactions. The first rule captures all trans-actions made between 6pm to 6:05pm where the amount involvedis at least $110. As previously mentioned, in practice each rulealso includes some threshold conditions (not shown here) on thescore for each transaction, as well as additional conditions on theuser/settings/etc. For simplicity we omit the score thresholds andthe additional conditions and focus in the sequel on the simplifiedrules in this example.

For the new transaction relation shown at the bottom of Figure 1,this rule captures the third tuple of the transaction relation (whichis an unlabeled transaction). The second rule captures none of thetuples, and the third rule captures the 10th unlabeled tuple in thetransaction relation. Hence, the existing rules Φ do not capture anyof the fraudulent transactions on day 𝑛 + 1.

Cost and Benefit of modifications A modification to a rule is achange in a condition of an attribute in the rule. One may also copyan existing rule before modifying the copy, add rule or remove anexisting rule. As we will elaborate in subsequent sections, our costmodel assumes there is a cost associated with every operation/mod-ification made to a condition in the rule.

To compare between possible modifications to rules and deter-mine which modifications are better, we need not only to know

3

Page 4: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

the cost of each modification but also the “benefit” it entails. In-tuitively, the gain from the modifications can be measured in threeways: (1) the increase in the number of fraudulent transactions thatare captured by the modified rule, (2) the the decrease in the num-bers of legitimate transactions that are captured by the modifiedrule, and (3) the decrease in the numbers of captured unlabeledtransactions. The assumption underlying (3) is that unlabeled trans-actions are typically assumed to be correct until explicitly report-ed/tagged otherwise and the more specific the rules are to fraudu-lent (and only) fraudulent transactions, the more precise they are inembodying possible domain-specific knowledge about the fraudu-lent transactions.

Observe that if our modifications are ideal, then after the modifi-cations, there are more fraudulent transactions captured, and lesslegitimate and unlabeled transactions caught. Subsequently, theoverall update cost is defined as the cost of modifications minustheir benefit. We give the formal definition in the next section.

3. PROBLEM STATEMENTAs described in the Introduction, there is typically a set of ex-

isting rules that works well for the transaction seen up to a certainpoint in time. Over time, as new transactions are made, there willbe new fraudulent and/or legitimate transactions that are identified.As a result, existing rules for fraud detection may not longer beadequate for capturing all fraudulent transactions and omitting alllegitimate transactions. Credit card companies depend on domainexperts to refine them using domain knowledge, and fine-tune theset of rules over time as new transactions arrive.

The goal of RUDOLF is to identify minimal modifications to ex-isting rules that would ideally capture all fraudulent transactions,omit all legitimate transactions, and at the same time, minimize theinclusion of unlabeled transactions. The modifications suggestedby RUDOLF serve as a starting point for the domain expert whocan either accept the proposed modifications or interactively refinethem with the help of RUDOLF .

DEFINITION 3.1 (GENERAL RULE MODIFICATION PROBLEM).Let Φ be a set of rules on an existing transaction relation 𝐼 . Let𝐼 ′ denote a new set of transactions over the same schema. Let𝐹,𝐿 ⊆ 𝐼 (resp. 𝐹 ′, 𝐿′ ⊆ 𝐼 ′) be two disjoint sets of fraudulent andlegitimate transactions in 𝐼 (resp. 𝐼 ′). Let 𝑅 = 𝐼 − (𝐹 ∪𝐿) (resp.𝑅′ = 𝐼 ′ − (𝐹 ′ ∪ 𝐿′)) be the remaining unlabeled transactions in𝐼 (resp. 𝐼 ′).

The GENERAL RULE MODIFICATION PROBLEM is to computea set 𝑀 of modifications to Φ to obtain Φ′ so that cost(𝑀) − (𝛼 *∆F + 𝛽 * ∆L + 𝛾 * ∆R) is minimized, where 𝛼, 𝛽, 𝛾 ≥ 0, and∙ ∆F = |(𝐹 ∪ 𝐹 ′) ∩ Φ′(𝐼 ∪ 𝐼 ′)| − |(𝐹 ∪ 𝐹 ′) ∩ Φ(𝐼 ∪ 𝐼 ′)|,

∙ ∆L = |(𝐿 ∪ 𝐿′) ∩ Φ(𝐼 ∪ 𝐼 ′)| − |(𝐿 ∪ 𝐿′) ∩ Φ(𝐼 ∪ 𝐼 ′)|, and

∙ ∆R = |(𝑅 ∪𝑅′) ∩ Φ(𝐼 ∪ 𝐼 ′)| − |(𝑅 ∪𝑅′) ∩ Φ(𝐼 ∪ 𝐼 ′)|.

The term (𝛼 * ∆F + 𝛽 * ∆L + 𝛾 * ∆R) represents the benefit ofapplying the given modifications to the rules. In our rule language,if a fraudulent transaction is not captured by a set of rules, then atleast some condition of a rule needs to be generalized to select thatfraudulent transaction. At the same time, making a condition moregeneral may capture also legitimate or unlabeled transactions. Con-versely, if a legitimate transaction is captured by a set of rules, thenat least some condition of a rule needs to be made more restrictiveso that the legitimate transaction is excluded. Such modificationscarry the risk of omitting some fraudulent transactions that shouldbe captured. The coefficients 𝛼, 𝛽 and 𝛾 are non-negative and are

used to tune the relative importance of each category (resp. cor-rectly capturing the fraudulent transactions, avoiding misclassify-ing legitimate transactions, and excluding unlabeled transactions)in the calculation of benefit. The overall goal is to identify themodifications having the best cost-benefit balance. Unless other-wise stated, we assume 𝛼 = 𝛽 = 𝛾 = 1 and we will discuss howthese values can be adjusted in Section 7.

Observe that if 𝛼, 𝛽 and 𝛾 are set to large numbers (e.g. greaterthan the maximal update cost) then the most beneficial modifica-tions are those leading to a “perfect” set of rules, namely one thatwill (1) capture all fraudulent transactions, (2) exclude all legiti-mate transactions, and at the same time, (3) does not capture anyunlabeled transactions. Also note that given an instance of the gen-eral rule modification problem, there always exists a set of modifi-cations that can be applied on Φ to obtain such a perfect Φ′.

FACT. For any given instance of the GENERAL RULE MODIFI-CATION PROBLEM, there always exists a set 𝑀 of modifications toΦ to obtain Φ′ so that 𝐹 ∪𝐹 ′ ⊆ Φ′(𝐼 ∪ 𝐼 ′) and (𝐿∪𝐿′)∩Φ′(𝐼 ∪𝐼 ′) = ∅ and no unlabeled transactions are captured by Φ′.

The modified rules Φ′ are obtained by first removing every rulein Φ and adding rules to Φ′ as follows. For each transaction (𝑣1, ..., 𝑣𝑛)in 𝐹∪𝐹 ′, we create a rule 𝐴1 = 𝑣1∧...∧𝐴𝑛 = 𝑣𝑛 and add it to Φ′.It is easy to see that the resulting set Φ′ of rules have the propertythat (𝐹 ∪𝐹 ′) ⊆ Φ′(𝐼∪𝐼 ′). Furthermore, (𝐿∪𝐿′)∩Φ′(𝐼∪𝐼 ′) = ∅and none of the unlabeled transactions are captured by Φ′(𝐼 ∪ 𝐼 ′).

While it is always possible to derive such a set of perfect rules,the cost of such modifications may be relatively high as the gen-erated rules can be very different from the existing rules. Further-more, these new, very specific, rules are not always the most infor-mative as they provide little knowledge on the properties associatedwith fraudulent transactions. The existing rules embody extensivedomain knowledge as they are manually curated by domain expertsover time. It is for this reason that RUDOLF (and domain experts)prefer to adapt existing rules than to create completely new andspecific rules to capture new fraudulent transactions (and excludelegitimate transactions).

Special cases We study two special cases next. The first specialcase depicts the scenario when there are new fraudulent transac-tions but no legitimate transactions in the new set of transactions,and the primary goal is to adapt the rules to capture the fraudulenttransactions. The second special case is the converse; there are nofraudulent transactions but only legitimate transactions in the newset of transactions and the primary goal is to adapt the rules to avoidcapturing legitimate transactions.

As we shall show in the next section, finding an optimal set ofchanges to the rules even in these special cases is computation-ally expensive. For this reason, instead of computing an optimalset of modifications to the rules to solve the general rule modifica-tion problem, we develop heuristic algorithms to identify the bestupdate candidates in these two special cases and use them to ulti-mately arrive at a solution for the general problem. The proposedmodifications are presented to the domain expert who may furtheradapt the rules (for example round certain range values instead ofusing specific values) or ask for an alternative set of proposed mod-ifications from RUDOLF.

4. ALGORITHMS

4.1 Capture Fraudulent TransactionsThe first special case of the general rule modification problem

that we consider is when the set 𝐼 ′ of new transactions contains

4

Page 5: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

only fraudulent transactions and the goal is to adapt the existing setof rules Φ to capture all fraudulent transactions. We call this specialcase the CAPTURE FRAUDULENT TRANSACTIONS PROBLEM.

THEOREM 4.1. The CAPTURE FRAUDULENT TRANSACTIONSPROBLEM is NP-hard even if Φ is perfect for 𝐼 , (namely, 𝐹 ⊆ Φ(𝐼)and 𝐿 ∩ Φ(𝐼) = 𝑅 ∩ Φ(𝐼) = ∅). The problem is NP-hard evenwhen 𝐼 contains only unlabeled transactions and 𝐼 ′ consist of onlyone fraudulent transaction.

PROOF SKETCH. The two statements are proven simultaneouslyby reduction from the Minimum Hitting Set problem, which isknown to be NP-hard. The transaction relation has as many columnsas the number of elements in the universe and for each set in thehitting set problem, there is an unlabeled characteristic tuple inthe relation that encodes the members that are present in that set.The new transaction relation consists of a single fraudulent tuple(1,1,...,1). With an initially empty set Φ of rules, one can show thata solution for the Minimum Hitting Set Problem implies a solutionfor the CAPTURE FRAUDULENT TRANSACTIONS PROBLEM andvice versa.

The reduction of the above proof shows that the NP-hardness re-sult may arise because we allow the size of the schema of the trans-action schema to vary. We show next that, even if we fix the sizeof the schema, the NP-hardness result continues to hold. The fullproofs of Theorem 4.1 and Theorem 4.2 are found in Appendix A.

THEOREM 4.2. The CAPTURE FRAUDULENT TRANSACTIONSPROBLEM is NP-hard even if Φ is perfect for 𝐼 and the size of theschema of the transaction relation is fixed.

The Algorithm In view of the hardness result, we develop a heuris-tic algorithm (Algorithm 1) for determining a best set of modifica-tions that would capture a given set of fraudulent transactions.

In the algorithm, we use 𝐼 to denote both old and new trans-actions. Observe that one reason for the hardness of the capturefraudulent transactions problem comes from the desire to identify aset of modifications that captures all fraudulent transactions and isglobally optimal. Instead, our heuristic works in a greedy manner,gradually covering more and more uncaptured transactions. Ratherthan treating each transaction individually, we split the fraudulenttransactions into smaller groups (clusters) of transactions that aresimilar to each other, based on a distance function, and treat eachcluster individually. We denote the set of clusters by 𝒞. Each clusterin 𝒞 is represented by a representative tuple. Intuitively, a represen-tative tuple of a cluster is a tuple that “contains” every tuple in thatcluster. Hence, if a rule is generalized to capture the representativetuple, then it must also capture every tuple in the associated cluster.The algorithm then identifies for each representative tuple the (top-k) best rule modifications to capture it. The proposed modificationsare verified with the domain expert, who may interactively furtheradapt them or ask for additional suggestions. Note that the modifi-cations made to the rules by the algorithm may result in capturingsome legitimate tuples. We will see how this too may be avoided inthe following sections.

We next describe our algorithm more formally. We first defineeach of the components, then provide a comprehensive examplethat illustrates all.Representative tuple of a cluster The representative tuple 𝑓 of acluster 𝐶 is a tuple with the same schema as tuples in 𝐶 such thatfor every attribute 𝐴, 𝑓.𝐴 contains 𝑡.𝐴 for every 𝑡 ∈ 𝐶. If 𝐴 is anattribute with a total order, then 𝑓.𝐴 is an interval that contains 𝑡.𝐴.If 𝐴 is a categorical attribute, then 𝑓.𝐴 is a concept that contains

Algorithm 1: Adapt rules to capture new fraudulent tuplesInput: A set Φ of rules for a transaction relation 𝐼 (contains

old and new transactions), with 𝐹 ⊆ 𝐼 and 𝐹 ′ ⊆ 𝐼 . 𝐹and 𝐹 ′ are the old and new fraudulent transactionsrespectively.

Output: A new set Φ′ of rules that captures 𝐹 ∪ 𝐹 ′.1: Let 𝒞 denote the result of clustering tuples in 𝐹 ∪ 𝐹 ′.2: foreach 𝐶 ∈ 𝒞 do3: Let 𝑓(𝐶) be the representative tuple of 𝐶.4: Let Top-𝑘(𝑓(𝐶)) denote the top-𝑘 rules for 𝑓(𝐶) based

on Equation 2.5: foreach 𝐶 ∈ 𝒞 do6: while there does not exist a rule 𝑟 such that 𝑓(𝐶) ∈ 𝑟(𝐼)

do7: if Top-𝑘(𝑓(𝐶)) is non-empty then8: Remove the next top rule 𝑟 from Top-𝑘(𝑓(𝐶)).9: Construct the smallest generalization of 𝑟 to 𝑟′ so

that 𝑓(𝐶) ∈ 𝑟′(𝐼).10: Ask whether the rule 𝑟′ is correct.11: if the domain expert agrees with the modified 𝑟′

then12: Replace 𝑟 with 𝑟′ in Φ.13: else14: Ask the domain expert which modifications in

𝑟′ are undesired.15: Revert the modifications to the original

conditions of 𝑟 as indicated by the domainexpert.

16: Allow the domain expert to make furthergeneralizations to the proposed rule.

17: else18: Create a rule that will select exactly 𝑓(𝐶) and add

it to Φ.

19: return Φ as Φ′.

𝑡.𝐴 for every 𝑡 ∈ 𝐶. Furthermore, 𝑓.𝐴 is the smallest interval(resp. concept) that has the above property.2 In other words, 𝑓 isthe “smallest” such tuple that contains every tuple in 𝒞.

Distance of a rule from a representative tuple The notation |𝑓 −𝑟| denotes the distance of a rule 𝑟 from a representative tuple 𝑓 .Intuitively it reflects how much the conditions in the rule need tobe generalized for the rule to capture the representative tuple. It isformally defined as:

Σ𝑛𝑖 (𝑓.𝐴𝑖 − 𝑟.𝐴𝑖), (1)

where 𝑟.𝐴𝑖 denotes the interval or concept associated with the con-dition of attribute 𝐴𝑖 in the rule 𝑟, and 𝑛 is the arity of the transac-tion relation.

The distance between two attribute intervals is defined as fol-lows. If 𝑓.𝐴 is the interval [𝑠1, 𝑒1] and 𝑟.𝐴 is the interval [𝑠2, 𝑒2],then |[𝑠1, 𝑒1] − [𝑠2, 𝑒2]| is the sum of sizes of the smallest inter-val(s) needed to extend [𝑠2, 𝑒2] so that it contains [𝑠1, 𝑒1]. For ex-ample, the distance of |[1, 5] − [5, 100]| is 4, while the distance of|[1, 100] − [1, 5]| is 95. The distance of |[5, 10] − [1, 100]| is 0,since [1,100] already covers [5,10]. If an attribute 𝐴 is categorical,then |𝑓.𝐴−𝑟.𝐴| is the length of the smallest “ontological distance”that need to be added to 𝑟.𝐴 so that it contains 𝑓.𝐴. For example,

2If there are multiple such concepts, e.g. in non tree-shaped con-cept hierarchies, we pick one.

5

Page 6: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

|Offline with PIN - Online with CCV| is 1, and |Offline withoutPIN - Online with CCV| is 2.

The overall cost function The overall cost of modifying a rule𝑟 to capture a representative tuple 𝑓 then reflects the amount ofmodifications that need to be carried (Equation 1) minus the benefitderived from those modifications:

Σ𝑛𝑖 (𝑓.𝐴𝑖 − 𝑟.𝐴𝑖) − (𝛼 * ∆F + 𝛽 * ∆L + 𝛾 * ∆R) (2)

Putting things together We now have all the ingredients for de-scribing our algorithms. The algorithm proceeds by clustering thetransactions into groups and then computes a representative tuplefor each cluster. 3 For every cluster 𝐶, we compute its repre-sentative tuple 𝑓(𝐶) as well the cost (according to Equation 2) ofmodifying each of the rules to capture it, and select the 𝑘 rules withminimal cost. We refer to them as the top-k rules (see Line 4 ofAlgorithm 1). In Line 8, we pick the top rule in top-𝑘(𝑓(𝐶)). Ifthe rule 𝑟(𝐼) does not already contain 𝑓(𝐶), we will attempt tomake the smallest generalization on 𝑟 to 𝑟′ so that 𝑟′(𝐼) contains𝑓(𝐶). Whenever the interval or concept of an attribute 𝑟.𝐴 doesnot contain 𝑓.𝐴, we will attempt to modify 𝑟.𝐴 by computing thesmallest extension needed on 𝑟.𝐴 based on its distance from 𝑓.𝐴.We perform this extension on every attribute 𝐴 of 𝑟 where 𝑟.𝐴 doesnot contain 𝑓.𝐴. This is what we mean by “generalize 𝑟 minimallyto 𝑟′ so that 𝑓(𝐶) ∈ 𝑟′(𝐼)” in line 9.

Next, we proceed to verify the new rule 𝑟′ with our potentialmodifications with the domain expert. If the domain expert agreeswith the proposed modifications (lines 11,12), we will replace 𝑟with the new rule 𝑟′ in Φ. Otherwise, we will refine our question toprobe the domain expert further on whether or not there are parts ofthe modifications that are desired even though the entire rule 𝑟′ isnot what the domain expert desires (lines 14,15). We then modifyonly the desired modifications, if any. The next step allows thedomain expert to make further generalizations to the rules. Afterthis, the algorithm proceeds to pick another closest rule to 𝑓(𝐶) toattempt to capture 𝑓(𝐶). Line 18 captures the case when we ranout of rules to modify. If this happens, we will construct a new ruleto cover 𝑓 by stating the exact conditions that are required.

We conclude with a remark regarding the computational com-plexity of the algorithm. All components of the algorithm (i.e.,clustering, computation of representative tuples for each cluster andthe top-k rules) execute in PTIME in the size of its input. Henceeach iteration executes in PTIME in the size of the input. The num-ber of iterations per cluster is dependent on the amount of refine-ments that the expert makes to the suggested rule modifications(shown in our experiments to be fairly small).

EXAMPLE 4.3. The relation below depicts the representativetuples of the clusters formed from the six fraudulent transactionsfrom Figure 1. The first tuple is the representative tuple of the clus-ter that consists of the first two tuples in Figure 1. The second (resp.third) tuple below is the representative of the cluster that consistsof only the 4th tuple in Figure 1 (resp., 6th, 7th, and 8th tuples).

Representatives of fraudulent transactions in Figure 1:Time Amount Transaction Type Location

[18:02,18:03] [106,107] Online, no CCV Online Store[19:08,19:08] [114,114] Online, no CCV Online Store[20:53,20:58] [44,48] Offline, without PIN GAS Station B

: : : :

Consider a domain expert, Elena, that is working with the sys-tem. The first rule in Figure 1 is the closest to the first repre-sentative tuple above. This is because Equation 2 evaluates to3In our implementation, we use the clustering algorithms of [25],but other clustering algorithms can be similarly be used.

(0+4+0+0)-(2+0+0)=2 for the first rule and the first represen-tative tuple, whereas the second and third rule of Figure 1 andthe first representative tuple have scores (43+4+0+0)-(2+0-1)=46and (178+0+0+1)-(6+0-3)=168, respectively, and are thus rankedlower than the first rule. The number ’1’ in the last calculation de-notes the ontological distance between “Gas Station A” and “GasStation B”. Since they are both contained under “Gas Station”, thedistance is 1.

Algorithm 1 will thus propose to modify the condition of the firstrule from “Amt ≥ 110” to “Amt ≥ 106” to capture the represen-tative tuple. It then proceeds to verify the modification with thedomain expert (i.e., Elena). Suppose Elena accepts the proposedmodification but further generalizes the condition rounding it downto “Amt ≥ 100” instead. So the new rule 1 is

1) Time ∈ [18:00,18:05] ∧ Amt ≥ 100.

Besides the fact that rounded values may be preferred by do-main experts over more specific values, such rounding may embodydomain-specific knowledge that may possibly lead to the discov-ery of more fraudulent transactions, particularly from transactionsthat are unlabeled, or the discovery of legitimate transactions thatshould not have been labeled as legitimate.

For the second cluster, the second rule in Figure 1 is the clos-est to the representative tuple of the cluster, (i.e., the second tupleabove). We omit the distance computation details. Algorithm 1 willpropose to modify the time condition of the second rule to “Time ∈[18:55,19:08]”. Suppose Elena accepts the proposed modificationand again further generalizes the condition, rounding it to “Time∈ [18:55, 19:15]”. So the new rule 2 is

2) Time ∈ [18:55,19:15] ∧ Amt ≥ 110.

For the third cluster, the third rule in Figure 1 is the closest tothe representative tuple of the cluster (i.e., the third tuple above).The algorithm will propose to modify both conditions on time andlocation to “Time ∈ [20:53, 21:15]” and “Location≤‘Gas station’”. Elena again fine-tunes the rule by generalizing the condition ontime so that the last rule becomes

3) Time ∈ [20:45, 21:15] ∧ Amt ≥ 40 ∧ Location≤Gas station’.

To conclude, observe that Algorithm 1 allows Elena to make fur-ther generalizations to the rules. Elena rounded the value downfrom 106 to 100 because her experience tells her that if frauds oc-cur with amount greater than $106, then it is likely to occur a fewdollars below $106 as well. Hence, she generalized (i.e., roundeddown) the value to $100. In making such generalizations, the fraud-ulent transactions will continue to be captured. However, the mod-ified rules may now capture (more) non-fraudulent transactions.Nonetheless, we still allow such generalizations since these are de-liberate changes made by Elena, the domain expert. More typically,however, such “rounding generalizations” tend to be meaningfulgeneralizations that may lead to the discovery of more fraudulenttransactions (i.e., unlabeled transactions that should be classifiedas fraudulent or legitimate transactions that are mistakenly labeledas legitimate)4. As we shall show in Example 4.6, Elena can alsoleverage her experience or domain knowledge to pinpoint the rightconditions for avoiding legitimate transactions.

We describe next how over-generalization may be treated.

4.2 Exclude Legitimate TransactionsIn the previous subsection, we have seen how one generalizes

rules to capture fraudulent transactions. We now discuss the op-posite case, where we wish to specialize rules instead, in order4This is from our conversations with domain experts on credit cardfraud detection.

6

Page 7: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

to exclude legitimate transactions when there no new fraudulenttransactions or unlabeled transactions but there are new legitimatetransactions. We call this special case the EXCLUDE LEGITIMATETRANSACTIONS PROBLEM. Here again we can show hardness re-sults analogous to Theorems 4.1 and 4.2.

THEOREM 4.4. The EXCLUDE LEGITIMATE TRANSACTIONSPROBLEM is NP-hard even if Φ is perfect for 𝐼 . The problem isNP-hard even when 𝐼 contains only unlabeled transactions and 𝐼 ′

consists of only one legitimate transaction.

THEOREM 4.5. The EXCLUDE LEGITIMATE TRANSACTIONSPROBLEM is NP-hard even if Φ is perfect for 𝐼 and the size of theschema of the transaction relation is fixed.

The proof of Theorem 4.4 is presented in Appendix A and thereduction is based on the Hitting Set Problem. In essence, in theconstruction of the proof, the best modifications to a rule is equiva-lent to the computation of a minimum hitting set. The modificationthat is applied to a rule is a restriction of the rule to avoid a legit-imate tuple by excluding the value of an attribute of the legitimatetuple. The construction of the proof reveals that in general, a rulemay need to be “splitted” to capture all original values except thatvalue that is to be excluded. This is the underlying intuition be-hind the heuristic algorithm that we will present next. The proof ofTheorem 4.5, also in the Appendix , resembles that of Theorem 4.2.

The Algorithm The hardness proofs reveal that it is in generalcomputationally expensive to find a globally optimal set of mod-ifications to avoid capturing new legitimate transactions. For thisreason, we develop a heuristic algorithm (Algorithm 2) that greed-ily determines the best attribute to “split” upon to avoid the captureof each legitimate tuple.

In Algorithm 2, we use 𝐼 to denote both old and new transac-tions. For each legitimate transaction 𝑙 in (𝐿 ∪ 𝐿′), we determinethe set Ω𝑙 of rules that will capture 𝑙 and modify every rule in Ω𝑙

to ensure that the modified rules will no longer capture 𝑙. For arule 𝑟 in Ω𝑙, the algorithm proceeds to pick an attribute 𝐴 wherewe can split the condition of the attribute to exclude the value 𝑙.𝐴.The attribute 𝐴 that we pick is the one that maximizes the benefitaccording to the benefit 𝛼 * ∆F + 𝛽 * ∆L + 𝛾 * ∆R, assuminga fixed cost of modification where we copy the rule and split onthe attribute. If there are multiple attributes with the same maxi-mum benefit, we will randomly pick one of them. Observe that thisheuristic of greedily selecting an attribute that will maximize ben-efit may not result in a globally optimal set of modifications in theend. In the proof of Theorem 4.4, this greedy heuristic is analogousto the strategy of repeatedly picking the attribute (and splitting onthe attribute) that will “hit” the most sets until all sets are hit.

Splitting on attributes Once an attribute is selected, the rule isduplicated into 𝑟1 and 𝑟2 and the condition on 𝐴 in both rules ismodified to exclude 𝑙.𝐴. Observe that since 𝑟(𝐼) captures 𝑙, wemust have that 𝑙.𝐴 satisfies the rule 𝑟’s condition on 𝐴. In the split,𝑟1’s condition on 𝐴 accept values from 𝑏 to the element that is thepredecessor of 𝑙.𝐴, where 𝑏 denotes the smallest value acceptedby the existing condition of 𝑟 on 𝐴. The rule 𝑟2 selects only ele-ments from the successor element of 𝑙.𝐴 to the largest value (i.e.,𝑒) accepted by the existing condition of 𝑟 on 𝐴.

For domains that are discrete and has a total order, the aboveprocedure, prev(𝑟.𝐴) and succ(𝑟.𝐴) are well-defined. However,when domains are categorical and has only a partial order, the ruleswill be split according to the partial order. Let 𝑂 denote the setof all concepts (excluding 𝑙.𝐴) that are parents of ⊥ in the partialorder (i.e., the leaf nodes of the partial order excluding ⊥). To

Algorithm 2: Adapt rules to exclude legitimate tuplesInput: A set Φ of rules for a transaction relation 𝐼 (contains

old and new transactions) with 𝐿 ⊆ 𝐼 and 𝐿′ ⊆ 𝐼 .Output: A new set Φ′ of rules that excludes 𝐿 ∪ 𝐿′.

1: foreach 𝑙 ∈ (𝐿 ∪ 𝐿′) do2: Let Ω𝑙 = 𝑟 ∈ Φ | 𝑙 ∈ 𝑟(𝐼).3: foreach 𝑟 ∈ Ω𝑙 do4: repeat5: Let 𝐴 be an attribute that has not been considered

before and where splitting on 𝐴 to exclude 𝑙.𝐴will minimize the cost associated with splittingon 𝐴.

6: Suppose the existing condition on 𝐴 is 𝐴 ∈ [𝑏, 𝑒].7: Split 𝑟 into 𝑟1 and 𝑟2 on 𝐴 as follows:8: Let 𝑟1 be a copy of 𝑟 except that the condition on

𝐴 is 𝐴 ∈ [𝑏, prev(𝑟.𝐴)].9: Let 𝑟2 be a copy of 𝑟 except that the condition on

𝐴 is 𝐴 ∈ [succ(𝑟.𝐴), 𝑒].10: Ask the domain expert whether the split into 𝑟1

and 𝑟2 is correct.11: if the domain expert agrees with the modification

then12: Add 𝑟1 and 𝑟2 to Φ.13: Allow the domain expert to make further

modifications to the proposed rules.14: Break out of repeat loop.15: until all attributes have been considered;16: Remove 𝑟 from Φ.

17: return Φ as Φ′.

exclude 𝑙.𝐴, the algorithm considers how to select a minimum setof concepts to “cover” all concepts in 𝑂 that excludes 𝑙.𝐴 at thesame time. It can be shown that the problem of computing such aminimum set is analogous to computing a minimum set cover for𝑂. Our procedure adopts the greedy heuristic where we greedilypick a concept in the partial hierarchy that covers the most numberof uncovered concepts in 𝑂 until all nodes in 𝑂 are covered. Forcategorical attributes, it may be necessary to duplicate 𝑟 more thantwice, where there is a rule to select each concept in the cover. Forexample, referring to Figure 2, to exclude “Online, with CCV”,we may pick “Offline” and “Online, without CCV” to cover theremaining concepts that are parents of “None”.

After this, we ask whether the domain expert agrees with thesplit. If the domain expert agrees, we add both rules 𝑟1 and 𝑟2 toΦ. The domain expert can also add further modifications to therules (line 13), such as excluding more values than what is sug-gested by the algorithm, and we break out of the repeat loop. Oth-erwise, we repeat the loop to attempt to split 𝑟 on another attributeto avoid capturing 𝑙. Note that since 𝑙 has to be excluded, one ofthe splits must be deemed correct by the domain expert. After this,we remove 𝑟 from Φ and repeat the same procedure to modify Φ toexclude the selection of another legitimate transactions.

EXAMPLE 4.6. We will now illustrate Algorithm 2 with an ex-ample. The relation below depicts legitimate transactions from Fig-ure 1 that are captured by the modified rules of Example 4.3. Themodified rules are also stated below for convenience.

Time Amount Transaction Type Location𝑙1 18:04 112 Online, with CCV Online Store𝑙2 19:10 117 Online, with CCV Online Store𝑙3 21:01 49 Offline, with PIN GAS Station A

: : : :

7

Page 8: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

Modified rules Φ from Example 4.3:1) Time ∈ [18:00,18:05] ∧ Amt ≥ 100.2) Time ∈ [18:55,19:15] ∧ Amt ≥ 110.3) Time ∈ [20:45, 21:30] ∧ Amt ≥ 40 ∧ Location≤Gas station’.Suppose all transactions above have been verified to be legiti-

mate transactions and it is easy to see that the modified rules cur-rently capture all legitimate transactions above. Hence, we wouldlike to modify the rules to exclude these legitimate transactions (andstill continue capture fraudulent transactions). For this example,assume that 𝛼 = 𝛽 = 𝛾 = 1.

The algorithm considers every legitimate transaction. Since thefirst legitimate transaction 𝑙1 is caught by the first rule above, Al-gorithm 2 proceeds to determine which attribute of the rule to splitin order to exclude 𝑙1. Splitting on time or amount or type willresult in the same maximum benefit: (1*0 (zero unlabeled transac-tions on either day) + 1*0 (the number of fraudulent transactionsthat are caught remains unchanged) + 1*1 (one less legitimatetransaction that is caught)). Splitting on the attribute location,however, will cause additional fraudulent transactions (i.e., the firsttwo transactions in Figure 1) to be missed and hence has a lowerbenefit than the rest of the attributes.

Continuing with our example, suppose the algorithm proposesto split on time (an arbitrary choice among time, amount ortype). This will result in two rules that will capture all fraudulenttransactions that were previously caught by the rule and exclude 𝑙1at the same time.

𝑟11: Time ∈ [18:00,18:03] ∧ Amt ≥ 100.𝑟12: Time ∈ [18:05,18:05] ∧ Amt ≥ 100.

At this point, Elena can accept this proposal or ask for alterna-tives. For the purpose of illustrating our algorithm, suppose Elenaasked for an alternative proposed modification. Our algorithm maynow propose to split on type instead. Since there is currently nocondition on type in the first rule, the condition is implicitly “type≤ ⊤”. And because the concepts “Offline” and “Online, withoutCCV” cover all possible type values (i.e., values immediately abovethe “None” node in Figure 2) except “Online,with CCV” which wewish to exclude, we have the following two rules:𝑟11: Time ∈ [18:00,18:05] ∧ Amt ≥ 100 ∧ Type ≤ Offline.𝑟12: Time ∈ [18:00,18:05] ∧ Amt ≥ 100 ∧ Type ≤ Onl.,no CCV.

Using domain knowledge that only online purchases, especiallythose without CCVs are of concern, Elena eliminates the rule 𝑟11.

After this, our algorithm proceeds in a manner that is similarto what was described before to split the second rule of Φ to omitthe second legitimate tuple. We omit the details here but supposeElena accepts the algorithm’s proposal to split on type and againremoves the rule that captures “Offline” transactions (since theyare not of concern). We have the following:

𝑟22: Time ∈ [18:55,19:15] ∧ Amt ≥ 100 ∧ Type ≤ Onl., no CCV.

For the last legitimate transaction 𝑙3, which is captured by thethird rule of Φ, Elena may also accept our algorithm’s proposal tosplit on type and we obtain two rules since “Online” and “Nocode” covers all values except “Offline, with PIN” that are imme-diately above “None”:𝑟31: Time ∈ [20:45,21:15] ∧ Amt ≥ 40 ∧ Location ≤ Gas Station ∧

Type ≤ Online.𝑟32: Time ∈ [20:45,21:15] ∧ Amt ≥ 40 ∧ Location ≤ Gas Station ∧

Type ≤ No code.

Observe that whenever a condition is generalized (in the previ-ous algorithm), more legitimate tuples and/or unlabeled tuples maybe inadvertently captured by the rule. Hence, further refinementsmay be needed to tune the rules to a desired state. Conversely, if

Algorithm 3: The General Rule Modification AlgorithmInput: A set Φ of rules for a transaction relation 𝐼 , where

𝐹 ⊆ 𝐼 and 𝐿 ⊆ 𝐼 . A new transaction relation 𝐼 ′ where𝐹 ′ ⊆ 𝐼 ′ and 𝐿′ ⊆ 𝐼 ′. 𝐹,𝐿, 𝐹 ′𝐿′ denote the existingfraudulent, legitimate, new fraudulent and,respectively, legitimate transactions.

1: repeat2: Phase 1:3: Apply Algorithm 1 with Φ, 𝐼 ∪ 𝐼 ′, 𝐹 , and 𝐹 ′ to obtain Φ′;4: Allow the expert to exit this phase when satisfied.5: Phase 2:6: Apply Algorithm 2 with Φ′, 𝐼 ∪ 𝐼 ′, 𝐿, and 𝐿′ to obtain Φ”.7: Allow the expert to exit this phase when satisfied.8: until until the domain expert says done;9: if the domain expert desires to capture remaining fraudulent

transactions then10: foreach 𝑓 ∈ ((𝐹 ∪ 𝐹 ′) − Φ”(𝐼 ∪ 𝐼 ′)) do11: Create a rule that will select exactly 𝑓 and add it to Φ”.

12: return Φ”.

a condition is specialized, some fraudulent tuples may be inadver-tently omitted. Hence, further refinements may be needed to tunethe rules to a desired state. As we shall describe next, the rules areinteractively refined based on the input of a domain expert such asElena. In particular, there may be several rounds of refinementsthrough generalizations and specializations before a desired set ofrules is obtained.

4.3 The General Rule Modification AlgorithmAlgorithms 1 and 2 described in the previous sections are part

of the general algorithm (see Algorithm 3) to provide a solution tothe general rule modification problem. Intuitively, Algorithm 1 isfirst applied to interactively refine the rules to capture the fraudu-lent transactions. But the expert is allowed to break in the middlewhen she is satisfied with the rules and believes that the omissionof the remaining fraudulent transactions is tolerable. The resultingset of rules may capture some (existing) legitimate tuples. Hence,Algorithm 2 is applied to interactively refine the rules to avoid thelegitimate transactions. Here again the user may break when shebelieves that the inclusion of the remaining legitimate transactionsis acceptable. However, the rules that result after Algorithm 2 isapplied may no longer capture some fraudulent transactions thatwere previously captured. The domain expert can either repeat theprocess described above to remedy this, or choose to end the rulerefinement process at this point. In the latter case, the domain ex-pert has a choice to leave the result as-is or allow the algorithm tocreate transaction-specific rules to capture each of the remainingtransactions.

Observe that it is essential for Algorithm 1 to be applied beforeAlgorithm 2 as one can always add rules to capture specific fraud-ulent transactions without accidentally capturing legitimate trans-actions. On the other hand, one cannot always add/modify rules toavoid specific legitimate transactions without accidentally exclud-ing fraudulent transactions with our current rule language.

EXAMPLE 4.7. Assume that we executed lines 2 and 3 of Algo-rithm 3 (see Examples 4.3 and 4.6 respectively). Suppose the set ofrules after Line 3 of Algorithm 3 is:

𝑟12: Time ∈ [18:03,18:05] ∧ Amt ≥ 100 ∧ Type ≤ Onl.,no CCV.𝑟22: Time ∈ [18:55,19:15] ∧ Amt ≥ 100 ∧ Type ≤ Onl., no CCV.𝑟31: Time ∈ [20:45,21:15] ∧ Amt ≥ 40 ∧ Location ≤ Gas Station ∧

Type ≤ Online.

8

Page 9: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

𝑟32: Time ∈ [20:45,21:30] ∧ Amt ≥ 40 ∧ Location ≤ Gas Station ∧Type ≤ No code.

This set of rules is almost identical to the outcome described inExample 4.6 except that, for the purpose of illustration, we assumethe condition on time in the first rule has been modified to “Time∈ [18:03,18:05]”. This set of rules will omit the first fraudulenttransaction in Figure 1. If the domain expert chooses to end thealgorithm in Line 4 and capture the remaining fraudulent transac-tions with specific rules, the following rule that captures exactly thefirst fraudulent transaction in Figure 1 will be added:

Time = 18:02 ∧ Amt = 107 ∧ Type = Onl.,no CCV ∧ Location = Onl. Store.

5. IMPLEMENTATION AND EXPERIMENTSRUDOLF is implemented in PHP (back-end), JavaScript (front-

end) and uses MySQL as the database engine, which stores theuniversal transaction relation. RUDOLF uses information from aMachine Learning (ML) module that scores each transaction andalso provides a threshold over which transactions are classified asfraudulent. These scores may then be used by the rules. In fact,in the ideal scenario, a single rule of the form “score > threshold”would have been sufficient to capture all and only all fraudulenttransactions. In practice however, further conditions need to beadded to the rules, which is facilitated by RUDOLF . For the creditcard dataset, the transactions that we obtain are already scored. Forthe network dataset, we implemented a state-of-the-art Naive Bayesbased classifier [30] to score the transactions (but other algorithmscan also be employed) for our ML module.

Within RUDOLF, there are three core modules: Manager, RulesEngine and Fraud Reporter. The Manager module is the mainmodule that executes the rule refinement algorithm and interactswith all other parts of the system. It interacts with the Rules Enginemodule, which manipulates (i.e., modifies/add/delete) the rules. Italso interacts with the Fraud Reporter module, which receives ex-ternal requests to label transactions as fraudulent or legitimate (e.g.,when credit card holders notify the credit card company of fraudu-lent transactions or about legitimate transactions that were wronglymisclassified as fraudulent and thus rejected). Upon receiving suchreports, the Fraud Reporter module passes the transactions that aremisclassified by existing rules to the Manager module for furtherinspection by the experts. The expert obtains the list of misclas-sified transactions and may attempt to modify the existing rules toremedy this. The Manager also interacts with Rules Engine and ex-ecutes our algorithms to obtain modifications to the existing rulesthat will help capture missed fraudulent transactions and avoid mis-classifying legitimate transactions. The rules and modifications aredisplayed via the User Interface. The expert can interactively ac-cept, decline, or further refine the proposed changes.Datasets Our experiments are based on a real-world credit carddataset and a publicly available network traffic log dataset.Credit Card Transactions: We have access to a real-world datasetby company XYZ5. Due to the sensitivity of credit card-related in-formation, we used anonymized version of the dataset and all per-sonal information as names, phones, IPs were discarded. In ad-dition, each address is replaced by the city where the address islocated in. The dataset consists of transaction sets of various sizesand activity types from 15 financial institutes (FIs) for the first quar-ter of 2016. The data includes payment and authentication activi-ties performed by the clients of the FIs. Each transaction set variesfrom 100K to 10M transactions and most of them consists of about500K transactions. The percentage of fraudulent transactions varies5Name omitted per company request

between 0.5% to 2.5% between different FIs. The number of miss-classified transactions (i.e., fraudulent transactions that are markedas legitimate and vice-versa) varies between 35% and 50%. Thetransactions contain both numerical (time, amount, number of pre-vious actions, etc.) and also categorical (location, client type, etc)data. For categorical attributes we used a public ontology based onDBPedia.

Along with the transactions, we obtained 15 rules-sets, one foreach of the 15 FIs for the same time period from company XYZ.We also obtained the change history and versions of those rules. Asmall FI typically has about 10 rules while a big FI typically hasabout 130 rules. Most FIs have about 55 rules on average. Usually,depending on the size of the FI and type, the rules are modified ona weekly to bi-weekly basis. Each time the rules are modified, therules undergo about 10 rounds of modifications on average. Thetransactions in the data sets are annotated as fraudulent/legitimate,and we take these annotations as the ground truth. Each transactionalso has a risk score, which is a value between 0 and 1000, that isgenerated by the company’s machine learning algorithm to deter-mine the chance that the transaction is fraudulent. The fraudulenttransactions can be captured by the set of rules given by the com-pany and allowing the users to refine the rules over time. Anotheroption is to apply a rule that classify all transactions with risk scoreabove a certain threshold as fradulent.Network Traffic Log: Our second data set consists of a public-ity available network traffic log [19] that includes different typesof network attacks. A transaction here corresponds to a networkpacket and the data set contains 10K transactions annotated as fraud-ulent, namely belonging to some attack, or legitimate. The datasource also includes a set of 50 fraud detection rules used by se-curity and IT experts to define Access List (ACL) for routers andfirewalls. We note that those rules can be expressed in our lan-guage and we use them as our initial set of rules (together with thescore-threshold rule generated by the machine learning module).We used our experts to identify “company-specific” rules (blockspecific web-site, i.e. social networks, due to company policy) vs.general security rules, that may fit any company (i.e. block knownbot-nets, close vulnerable ports based on known exploits, etc), andexamined how our system assist the experts in refining the differenttype of rules in response to different attack types.Experiment scenarios The different sizes of transaction sets fromthe different FIs allowed us to vary our experiments with differ-ent dataset sizes (from 100K to 10M, with the average value being500K).

To simulate the work of a domain expert, we we spilt each datasetinto two parts of approximately the same size. The first part con-sists of transactions with timestamp prior to a certain point in timeand this set plays the role of previously seen transactions. We usethis set as the training data for ML module. The second part con-sists of transactions after that time point and plays the role of the setof new transactions. We advanced in time from this point and ex-amined, at different points in time, how the expert adapts the rulesin response to transactions arriving up to that point.

We compared the performance of RUDOLF to three alternativesolutions, to be detailed below. For each of the algorithms andeach of the datasets, we varied the number of new transactions ar-riving between consecutive rounds of rule refinement. The numberof new transactions varies from 10% to 20% of the dataset, withthe default being 10%, and this corresponds closely to what hap-pens in real-life between rounds of rule refinement. We run eachexperiment with 5 users. As the variance was rather small (less that2%) we present here the average. For the network logs, our usersare cyber security experts specializing in network attacks. For the

9

Page 10: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

credit cards dataset, our users are fraud detection experts from com-pany XYZ. We also ran our experiments with student volunteers todetermine whether the level of expertise affects the results.

Baseline algorithms We consider the two extreme baselines: Afully-manual setting, where rules are manually refined by expertswithout the help of the system (the current setting that is used bycompany XYZ experts in their daily work), and a fully-automaticsetting that uses the risk score produced by the ML algorithm anda single rule that selects fraudulent transactions based on their riskscores. In our credit card dataset, the risk scores are generated bythe company’s start-of-the-art credit card classification algoirthm.Observe that this algorithm essentially generates a single new ruleof the form score greater than threshold (rather than refining anexisting set of rules). We also compared to the baseline algorithmNo Change, which denotes the given rules without any changes.

Observe that the fully manual setting is arguably our “toughest”competitor since the rules are modified by experts and the expertsare not limited by any time constraint to refine the rules.

In addition to the above, we also consider a variant of RUDOLF,denoted RUDOLF−, that automatically refines the existing set ofrules by accepting the modifications proposed by the system with-out consulting an expert. We also considered RUDOLF -s, whichis the system RUDOLF that does not refine categorical attributes(and hence does not use ontologies) of rules. To the best of ourknowledge, all existing systems refine only numerical attributes ofrules. Hence the performance of RUDOLF -s will allow us to under-stand how RUDOLF compares with systems that are only restrictedto refine numerical attributes. In fact, we discovered that RUDOLF-e gives almost same results as the fully-manual system and alsoRUDOLF−. Hence, we omit the results of RUDOLF -s completely.

Clustering As described, we cluster our transactions before apply-ing the main logic of our algorithm as clustering allows us to reducethe number of transaction that an expert needs to examine. We setthe clustering thresholds (i.e., maximal distance per each attributeof the transactions) for both numerical and categorical attributes af-ter consulting with the domain experts. The number of clusters arevaried from 20 to 200 (for 10K to 100K transactions).

Measurements In our experiments, we measured the efficiency ofthe algorithms in terms of the effectiveness of the derived rules andthe amount of time that the domain experts saved as a consequenceof using our system. We also measured the running time requiredby RUDOLF to select the proposed modifications. For our datasetsthis was always at most one second, and we thus omit the exactmeasures here.

To measure the effectiveness of a set of rules derived by any ofthe methods, we consider its prediction quality, namely how cor-rectly it identifies future frauds. For that, we examine the set oftransactions from the given point in time where the rules were de-rived and until the end of the dataset. For these future transactionswe count, for each set of rules, the percentage out of all fraudu-lent (resp. legitimate) transactions that it identifies (resp. wronglyclassifies as fraudulent).

To understand of how many modifications each method entailed,we also computed the cumulative number of rule updates that eachmethod required. We measure this only for RUDOLF, RUDOLF−,and the fully-manual methods, which directly change existing rules,counting the number of the attribute changes/rule duplication/split-ting being performed.

Finally, we also measured the time the experts took to refine therules, with the help of RUDOLF and without.

Results for the Credit Cards dataset We first report on our exper-iments with the domain experts. Our first experiment examines the

performance of the algorithms as time advances, with all parame-ters set to their default value. As explained above, at each point intime, (i.e. after a certain percentage of the transactions has beenobserved), the algorithms are invoked to derive a correspondingupdated set of rules. Figure 3(a) shows the (cumulative) number ofmodifications that RUDOLF, fully-manual and RUDOLF− methodperformed to the rules. (Recall that the fully-automatic ML algo-rithm does not refine the rules hence doesn’t change). We see thatRUDOLF consistently performs less modifications than its competi-tors.

We can see this more clearly in Figure 3(b), which illustratesthe prediction quality of the derived sets of rules, in terms of thepercentage of misclassified future transactions (lower percentageof error implies better prediction quality). RUDOLF performs thebest, providing the best prediction. The fully manual rule derivationprovides less accurate predictions, though still better than the twoautomatic competitors. Among the two, RUDOLF−, that incremen-tally refines the rules, still performs better than the threshold-basedML approach.

We note that the difference in performance between RUDOLF−

and RUDOLF demonstrates the importance of incorporating expertsand their domain knowledge in the loop.

For the experiments above, the rules were periodically refinedin hops of 10% of the transactions. For different hops sizes, theresults are also similar, except that convergence naturally arrivesafter fewer (proportionally) iterations for larger hops.

Our next experiment examines the performance of the algorithmsfor varying dataset, with almost the same percentage of fraud, butdifferent sizes. The size had no significant effect on the number ofrule modifications performed by the algorithms, but the predictionquality slightly improved as more data was available. Figure 3(c) il-lustrate, for varying dataset sizes, the prediction quality of the rulesafter the first refinement round, in terms of the percentage of mis-classified transaction. Here again, lower percentage means betterquality. As before, RUDOLF yields best results. We can see thatthe error of all algorithms slightly decreases as the size of the dataset grows. The improvement is only small as fraudulent transac-tions of the existing fraud patterns are distributed throughout thedatasets, so the additional data reveals some, but not huge, amountof new information. Similar results were obtained for the followingrefinements rounds and we thus omit the graphs.

Next, we examine the performance of the algorithms for vary-ing percentages of fraudulent transactions. We took 4 differentcustomers databases of roughly the same size, but different fraudpercentages (0.5% to 2.5%). All other parameters are set to theirdefault values. Figures 3(d) 3(e) show, respectively, the number ofrule updates and percentage of error after the first refinement round.We can see that an increased number of fraudulent transaction en-tails more rule modifications to capture them. The classificationerror slightly increases with more fraudulent transactions, but hereagain RUDOLF achieves the lowest error.

Finally we note that rule refinement with RUDOLF not only con-sistently yielded superior fraud prediction, but also reduced thetime required from the experts by a factor between 4 to 5. (Around50 seconds per round for RUDOLF compared to 4-5 minutes with-out). We measured time of our experts performance, depicted infigure 3(f). We asked them to fix up to 50 problematic transac-tions in both manual and automatic way. Interestingly, no expertfinished all 50 fixes in the manual mode (a well-trained expert fromcompany XYZ usually can fix 30-40 transactions per work-day).The fact that RUDOLF leads in performance across all parametersis interesting as the rules derived manually by experts are typicallyconsidered as ground truth and yet, RUDOLF is able to do better

10

Page 11: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

0

5

10

15

20

25

30

35

60% 70% 80% 90% 100%

# o

f ch

ange

s

% of seen transactions

Rudolf- Manual Rudolf

0%

10%

20%

30%

40%

50%

60%

70%

50% 60% 70% 80% 90% 100%

Erro

r %

% of seen transactions

Auto No Change Rudolf- Manual Rudolf

0%

10%

20%

30%

40%

50%

60%

70%

100 250 500 1000

Erro

r %

Dataset size (K)

Auto No Change Rudolf- Manual Rudolf

0

5

10

15

20

25

0.5% 1.0% 1.5% 2.5%

# o

f ch

ange

s

% of fraud

Rudolf- Manual Rudolf

0%

10%

20%

30%

40%

50%

60%

70%

0.5% 1.0% 1.5% 2.5%

Erro

r %

% of fraud

Auto No Change Rudolf- Manual Rudolf

0

600

1200

1800

2400

3000

3600

1 2 5 10 50

Tim

e (s

eco

nd

s)

# of changes

Rudolf Manual

Figure 3: Experimental results

(i.e., with less changes, and with lower percentage of error) withless data. This was consistent in all the experiments. Furthermore,all users reported that working with RUDOLF was convenient andeffective in the sense that the rules/modifications proposed by thesystem helped them identify and focus on the problematic rulesand the needed treatment. Zooming into the type of rule modifi-cations performed with RUDOLF, we observed that around 75% ofthe modifications were condition refinements, 20% rule splits, and5% rule addition.

Results for the Network Traffic data set We repeated the aboveset of experiments for the Network Traffic data set and observe sim-ilar results. Hence, we omit the graphs but summarize our findings.

RUDOLF performs best and manual refinement is better than bothautomatic solutions. However, because of the focused nature ofnetwork attacks, the manual refinement of rules takes the networkexperts less time (3-4 mins per refinement as opposed to 4-5 minsfor the credit cards), but still more, by a factor between 3-4, thanwith RUDOLF.

6. RELATED WORKThe identification of fraudulent transactions is essentially a clas-

sification problem. Classification has been a fundamental problemin machine learning and data management [18, 22], and crowd-sourcing has recently emerged as a major problem solving paradigm[8]. Many classification works have used crowdsourcing to obtaintraining data for learning [28, 2, 14, 23, 15]. This is complimen-tary to our work: In RUDOLF the crowd (of security experts) isemployed to refine and maintain classifications rules, which mighthave been initially learnt through such training. Besides learning-based models, rules are also used for classification. Most of theprevious research in rule-based classifiers focus on how to learnrules from the training data. In contrast, the Chimera system [26],employs both learning and analyst experts that manually createclassification rules using regular expressions. In [3] that describesLinkedIn’s job title classification system, domain experts and crowd-sourcing are also heavily used. In both cases however the ongoingmaintenance and refinement of rules in a changing environment,which is the focus of RUDOLF, is not considered.

In addition to machine learning-based methods, there are multi-ple fraud-detection techniques that have been considered. For ex-ample [16] uses a decision tree, defined recursively for nodes and

edges of the tree and using the ratio between number of transac-tions that satisfy some condition to label them accordingly. Othermethods, e.g. [4], are based on genetic programming, used to clas-sify transactions into suspicious and non-suspicious ones. Anotherclass of the algorithms for fraud detection is based on clusteringtechniques. An example is [6] that clusters users based on commonbehavior and then considers as suspicious the transactions that takethe user outside its cluster. Bayesian networks are used both to de-tect fraud in telecommunications (e.g. [9]) and in the credit cardindustry (e.g. [17]). Neural networks are also used for fraud detec-tion. For instance [13] presents an online fraud detection system,based on a neural classifier. However, the main constraint is that thetypes of “accounts” for which networks are built need to be prede-fined and fixed a priori. All these techniques are complimentary toours and can be used to deriving the initial base-set of rules.

Another class of work similar RUDOLF is that of Concept Drifts,which are changes that occur on the distribution of the input thataffects the learning system and thus the output. [29] deals withconcept drifts by using sliding window that adaptively remembersmore or less items from the training set (the closest past) accordingto whether it recognized a concept drift or not. Other system ([10])is classification system based on decision rules. Even though thesesystems can compute the nearest neighbors for the closest rules andgeneralization for numerical values, they do not support general-ization and specification on categorical attributes, do not involve ahuman expert in the loop, and do not allow configuration of weightfor different kind of errors (false positives and false negatives).

Finally, if we view our transaction relation as the source databaseand the set of fraudulent transactions as our target database, thenthe work on deriving queries or schema mappings based on source-target databases (e.g., [11, 24, 27, 1]) is also relevant. Similarly,techniques for rule mining and, in particular, inductive logic pro-gramming (e.g., [12]) can also be used for fraud detection, wherethe fraudulent transactions can be seen as positive examples and thelegitimate transactions can be seen as negative examples. However,the language of schema mappings and inductive logic program-ming are different from our rule language and more importantly,the rules derived cannot be interactively adapted. Furthermore, theframework and solutions proposed strictly prohibit the possibilityof including unlabeled transactions, which is not our case.

11

Page 12: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

7. CONCLUSION AND FUTURE WORKWe present RUDOLF, a novel system that assists domain experts

in defining and adapting rules for fraud detection in dynamic envi-ronments. We show that the problem of identifying the best candi-date adaptation is NP-hard and present PTIME heuristic algorithmsfor determining the set of rules to adapt and working interactivelywith the domain experts until they are satisfied with the resultingrules. Our experiments demonstrate the promise that RUDOLF isan effective and efficient tool for rules refinement.

There are several challenging directions for future work. Specif-ically, we plan to investigate the impact of employing a more so-phisticated cost computation. In our cost model, there is a cost as-sociated with every modification made to a condition in the rule. Ingeneral, each attribute may also be associated with a weight to indi-cate the degree of preference towards changing the condition of thatattribute. Such weights may be manually adjusted by an expert orderived automatically using different learning techniques based onusers feedback, reporting their satisfaction with the suggested mod-ification. Dynamically adapting the weights based on user interac-tion (e.g. refinement of proposed modifications) is an interestingfuture research. Similarly, the parameters 𝛼, 𝛽 and 𝛾 used in ourcost formula to weight the importance of misclassifying of fraudu-lent/legitimate/unlabeled transactions may be dynamically adaptedbased on such user input. Machine leaning may also be used to an-ticipate what additional rule refinements experts may perform dueto domain knowledge and proactively incorporate them in the mod-ifications suggested to the experts.

The use of more expressive rules specification language is an-other intriguing future research. In addition, we plan to considerricher interactions with multiple collaborating experts, where tech-niques from crowdsourcing which exploit multiples users to verify,support, and complete each other’s input, may be useful here.

8. REFERENCES[1] B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. Designing and

refining schema mappings via data examples. In SIGMOD, pages133–144, 2011.

[2] V. Ambati, S. Vogel, and J. G. Carbonell. Active learning andcrowd-sourcing for machine translation. In LREC 2010, 2010.

[3] R. Bekkerman and M. Gavish. High-precision phrase-baseddocument classification on a modern scale. In KDD 2011, pages231–239, 2011.

[4] P. J. Bentley, J. Kim, G.-H. Jung, and J.-U. Choi. Fuzzy darwiniandetection of credit card fraud.

[5] The us sees more money lost to credit card fraud than the rest of theworld combined. http://www.businessinsider.com/the-us-accounts-for-over-half-of-global-payment-card-fraud-sai-2014-3.

[6] R. J. Bolton and D. J. Hand. Statistical fraud detection: A review.Statistical Science, 2002:235–255, 2002.

[7] A. Chapman and H. V. Jagadish. Why not? In SIGMOD, pages523–534, 2009.

[8] A. Doan, R. Ramakrishnan, and A. Y. Halevy. Crowdsourcingsystems on the world-wide web. Commun. ACM, 54(4):86–96, 2011.

[9] K. J. Ezawa and S. W. Norton. Constructing bayesian networks topredict uncollectible telecommunications accounts. IEEE Expert:Intelligent Systems and Their Applications, 11(5):45–51, Oct. 1996.

[10] F. J. Ferrer-Troyano, J. S. Aguilar-Ruiz, and J. C. R. Santos. Datastreams classification by incremental rule learning withparameterized generalization. In Proceedings of the 2006 ACMSymposium on Applied Computing (SAC), Dijon, France, April23-27, 2006, pages 657–661, 2006.

[11] G. H. L. Fletcher, M. Gyssens, J. Paredaens, and D. V. Gucht. On theexpressive power of the relational algebra on finite sets of relationpairs. TKDE, 21(6):939–942, 2009.

[12] L. Galarraga, C. Teflioudi, K. Hose, and F. M. Suchanek. Fast rulemining in ontological knowledge bases with AMIE+. VLDB J.,24(6):707–730, 2015.

[13] D. J.R., G. F., S. C., and C. C.S. Neural fraud detection in credit cardoperations. IEEE Trans. on Neural Networks, 8(4):827–834, 1997.

[14] E. Kamar, S. Hacker, and E. Horvitz. Combining human and machineintelligence in large-scale crowdsourcing. In AAMAS 2012, Valencia,Spain, June 4-8, 2012 (3 Volumes), pages 467–474, 2012.

[15] D. R. Karger, S. Oh, and D. Shah. Iterative learning for reliablecrowdsourcing systems. In NIPS 2011, pages 1953–1961, 2011.

[16] A. I. Kokkinaki. On atypical database transactions: Identification ofprobable frauds using machine learning for user profiling. Knowledgeand Data Exchange, IEEE Workshop on, 0:107, 1997.

[17] S. Maes, K. Tuyls, B. Vanschoenwinkel, and B. Manderick. Creditcard fraud detection using bayesian and neural networks. In Proc. ofthe 1st Inter. NAISO Congress on Neuro Fuzzy Technologies, 2002.

[18] T. M. Mitchell. Machine learning. McGraw Hill series in computerscience. McGraw-Hill, 1997.

[19] Caida network dataset. http://www.caida.org/data/overview/.[20] Card fraud worlwide. http://www.nilsonreport.com/

publication chart and graphs archive.php?1=1∖&year=2015.[21] C. Phua, V. C. S. Lee, K. Smith-Miles, and R. W. Gayler. A

comprehensive survey of data mining-based fraud detection research.CoRR, abs/1009.6119, 2010.

[22] R. Ramakrishnan and J. Gehrke. Database management systems (3rded.). McGraw-Hill, 2003.

[23] V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni,and L. Moy. Learning from crowds. Journal of Machine LearningResearch, 11:1297–1322, 2010.

[24] A. D. Sarma, A. G. Parameswaran, H. Garcia-Molina, and J. Widom.Synthesizing view definitions from data. In ICDT, 2010.

[25] M. Shindler, A. Wong, and A. Meyerson. Fast and accurate k-meansfor large datasets. In NIPS, pages 2375–2383, 2011.

[26] C. Sun, N. Rampalli, F. Yang, and A. Doan. Chimera: Large-scaleclassification using machine learning, rules, and crowdsourcing.PVLDB, 7(13):1529–1540, 2014.

[27] Q. T. Tran, C. Y. Chan, and S. Parthasarathy. Query reverseengineering. VLDB J., 23(5):721–746, 2014.

[28] S. Vijayanarasimhan and K. Grauman. Large-scale live activelearning: Training object detectors with crawled data and crowds. InCVPR 2011, pages 1449–1456, 2011.

[29] G. Widmer and M. Kubat. Learning in the presence of concept driftand hidden contexts. Machine Learning, 23(1):69–101, 1996.

[30] M. Zareapoor and P. Shamsolmoali. Application of credit card frauddetection: Based on bagging ensemble classifier. Procedia ComputerScience, 48:679 – 685, 2015. International Conference on Computer,Communication and Convergence (ICCC 2015).

APPENDIXA. PROOF

THEOREM 4.1. The CAPTURE FRAUDULENT TRANSACTIONSPROBLEM is NP-hard even if Φ is perfect for 𝐼 , (namely, 𝐹 ⊆ Φ(𝐼)and 𝐿 ∩ Φ(𝐼) = 𝑅 ∩ Φ(𝐼) = ∅). The problem is NP-hard evenwhen 𝐼 contains only unlabeled transactions and 𝐼 ′ consist of onlyone fraudulent transaction.

PROOF. We prove the two claim simultaneously by reductionfrom the minimum hitting set problem which is known to be NP-hard. We recall of the Minimum Hitting Set Problem below.

DEFINITION A.1 (MINIMUM HITTING SET). Consider the pair(𝑈, 𝑆) where 𝑈 is a universe of elements and 𝑆 is a set of subsetsof 𝑈 . A set 𝐻 ⊆ 𝑈 is a hitting set of 𝑆 if 𝐻 hits every set in 𝑆.In other words, 𝐻 ∩ 𝑆′ = ∅ for every 𝑆′ ∈ 𝑆. A minimum hittingset 𝐻 is a minimum cardinality hitting set s.t. ∀𝑒 ∈ 𝐻 , we have𝐻 ∖ 𝑒 is not a hitting set of 𝑆.

We assume that each rule modification is associated with a unitcost. Given an instance of the hitting set problem, we construct aninstance of the CAPTURE FRAUDULENT TRANSACTIONS PROB-LEM were there no fraudulent or legitimate transactions in 𝐼 and 𝐼 ′

consist of only one fraudulent transaction, as follows.

12

Page 13: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

The transaction relation has |𝑈 | columns, one for each elementin 𝑈 . The transaction relation 𝐼 has a characteristic (unlabeled) tu-ple for every set 𝑠 ∈ 𝑆. That is, for every set 𝑠 in 𝑆, we constructa characteristic tuple of 𝑠 in 𝐼 by placing a 0 in position 𝑖 if 𝑥𝑖 ∈ 𝑠and 1 otherwise. Hence there are |𝑆| tuples in the transaction rela-tion 𝐼 and these are unlabeled tuples that we would like to minimizecapturing with the rules. There are no existing fraudulent or legiti-mate tuples in 𝐼 . The set Φ is initially empty. Hence, Φ(𝐼) does notcapture any transaction (and thus, by definition, is “perfect” for 𝐼).The instance 𝐼 ′ consists of a single transaction (1,1,1,...,1), whichis the new fraudulent transaction that we wish to capture.

As an example, consider the following hitting set where 𝑈 =𝐴1, 𝐴2, 𝐴3, 𝐴4, 𝐴5 and 𝑆 = 𝑠1, 𝑠2, 𝑠3, where 𝑠1 = 𝐴1, 𝐴2, 𝐴3,𝑠2 = 𝐴2, 𝐴3, 𝐴4, 𝐴5, and 𝑠3 = 𝐴4, 𝐴5. The transactionrelation 𝐼 ∪ 𝐼 ′ (where 𝐼 ′ is highlighted in gray) is shown below.The last column annotates the type of tuple (e.g., the last tuple is afraudulent transaction).

A1 A2 A3 A4 A5

0 0 0 1 11 0 0 0 01 1 1 0 01 1 1 1 1 F’

It is straightforward to verify that the construction of the instanceof the Rule Modification Problem can be achieved in polynomialtime in the size of the input of the Hitting Set Problem. We nowshow that a solution for the Minimum Hitting Set Problem impliesa solution for the Rule Modification Problem and vice versa.

Let 𝐻 be the minimum hitting set. For every element 𝑥𝑖 in 𝐻 , weadd a condition 𝑎𝑖 = 1 to the same rule. (The condition can also be𝑎𝑖 ≥ 1 or 𝑎𝑖 > 0 but wlog we assume 𝑎𝑖 = 1 is used.) The cost ofchanging 𝑎𝑖 = 1 is 1. Clearly, we have that Φ′(𝐼 ∪ 𝐼 ′) contains thefraudulent transaction and since 𝐻 hits every set 𝑠 ∈ 𝑆, for every𝑠 ∈ 𝑆, there must be at least an attribute 𝑎𝑖 in the correspondingtuple of 𝑠 whose value 0 when the condition 𝑎𝑖 = 1 according tothe new rule Φ′. Hence, it follows that Φ′(𝐼 ∪ 𝐼 ′) does not containany tuples in 𝐼 ∪ 𝐼 ′ other than the fraudulent tuple.

We show next that the expression cost(𝑀)−(𝛼*∆F+𝛽*∆L+𝛾 * ∆R) is the minimum possible, assuming that 𝛼 = 𝛽 = 𝛾 > 1(𝛽 is actually irrelevant here) and the cost of each modification is 1.Suppose there is another set of modifications whose cost is lowerthan the above. It must be that none of the unlabeled tuples areselected by the modified rules since one can always avoid capturingan unlabeled tuple and lower the cost even further by adding anappropriate condition 𝑎𝑖 = 1. Furthermore, by the same reason, thefraudulent tuple must be selected by the modified rules. Thus, theexpression cost(𝑀)−(𝛼*∆F+𝛽 *∆L+𝛾 *∆R) is the minimumpossible when the number of 𝑎𝑖 = 1 conditions correspond to thesize of a the minimum hitting set. Any smaller value will mean wehave a smaller cardinality hitting set which is a contradiction to ourassumption that 𝐻 is a minimum hitting set.

For the converse, let 𝑀 be the set of modifications made to Φsuch that cost(𝑀) − (𝛼 * ∆F + 𝛽 * ∆L + 𝛾 * ∆R) is minimum.Again, wlog, we may assume that the modifications must be of theform 𝑎𝑖 = 1.

Let 𝐻 = 𝑥𝑖 | 𝑎𝑖 = 1 in the modified rule. We show next that𝐻 is a minimum hitting set. First, we show that 𝐻 is a hitting set.Suppose not, then there is a set 𝑠 ∈ 𝑆 such that 𝐻∩𝑠 = ∅. Let 𝑡 bethe tuple that corresponds to 𝑠 in the transaction table. This meansthat Φ′(𝐼 ∪ 𝐼 ′) contains 𝑡, since 𝑡 contains the value 1 in everyattribute 𝑎𝑖 where 𝑎𝑖 = 1 in the modified rule. Pick an element,say 𝑥𝑗 ∈ 𝑠 such that 𝑥𝑗 ∈ 𝐻 . Now if we add the modification𝑎𝑗 = 1, the change in cost is +1-𝛾. Since 𝛾 > 1, we have thatthe new total cost is lower than the original cost which contradicts

the assumption that 𝑀 is a set of modifications that would give theminimum total cost.

Next, we show that 𝐻 is a minimum hitting set. Suppose not,then there is another hitting set 𝐻 ′ where |𝐻 ′| < |𝐻|. With 𝐻 ′, itis straightforward to construct a set of modifications whose cost islower than cost(𝑀) − (𝛼 * ∆F + 𝛽 * ∆L + 𝛾 * ∆R).

In our running example, a modified rule is𝐴1 ≤ ⊤ ∧𝐴2 = 1 ∧𝐴3 ≤ ⊤ ∧𝐴4 = 1 ∧𝐴5 ≤ ⊤

since a minimum hitting set is 𝐴2, 𝐴4.

THEOREM 4.2. The CAPTURE FRAUDULENT TRANSACTIONSPROBLEM is NP-hard even if Φ is perfect for 𝐼 and the size of theschema of the transaction relation is fixed.

PROOF SKETCH. The proof makes use of a reduction from theset cover problem and in the reduction, a single unary transactionrelation is used. We build a taxonomy of the elements of a setcover instance according to which element belongs to which set.The relation is initially empty and assume Φ is initially empty aswell. The cost of adding rule with a condition is 1 and we assumethat the cost of adding the condition 𝐴 ≤ ⊤ is very high (i.e., itis prohibited). The new transaction relation 𝐼 ′ consists of 𝑛 newfraudulent transactions, one for each element of the universe in theset cover instance. One can then establish that a set of rules ofminimum cost can be derived from a minimum set cover and viceversa. Intuitively, each rule has the form 𝐴 ≤ 𝑆𝑖 where each 𝑆𝑖

is part of the solution to the instance of the minimum set coverproblem.

THEOREM 4.4. The EXCLUDE LEGITIMATE TRANSACTIONSPROBLEM is NP-hard even if Φ is perfect for 𝐼 . The problem isNP-hard even when 𝐼 contains only unlabeled transactions and 𝐼 ′

consists of only one legitimate transaction.

PROOF. Given an instance of the hitting set problem, we con-struct an instance of the Rule Modification Problem for ExcludingLegitimate Transactions as follows.

The transaction relation has |𝑈 | columns, one for each elementin 𝑈 . For every set 𝑠 in 𝑆, we construct a characteristic tuple of 𝑠by placing a 0 in position 𝑖 if 𝑥𝑖 ∈ 𝑠 and 1 otherwise. Hence thereare |𝑆| tuples in the transaction relation 𝐼 so far and the fraudulenttransactions 𝐹 = 𝐼 . The set Φ consists of a single rule

𝐴1 ≤ ⊤ ∧ ... ∧𝐴|𝑈| ≤ ⊤,where ⊤ denotes the top element, and hence, Φ(𝐼) currently cap-tures all fraudulent transactions 𝐹 . The new transaction relation𝐼 ′ consists of a single tuple (1,1,1,...,1). This set 𝐿′ of legitimatetransactions is a singleton set consisting of only (1,1,1,...,1). Thatis, 𝐿′ = 𝐼 ′. This is the legitimate transaction that we wish to ex-clude.

With 𝐹 ′ = 𝐿 = ∅, our goal is to specialize the rule in Φ tocapture exactly the fraudulent tuples 𝐹 only. Like in the proof ofTheorem 4.1, we assume that each modification is associated witha unit cost and 𝛼 = 𝛽 = 𝛾 > 1.

As an example, consider the same hitting set as in the proof ofTheorem 4.1, where 𝑈 = 𝐴1, 𝐴2, 𝐴3, 𝐴4, 𝐴5 and 𝑆 = 𝑠1, 𝑠2, 𝑠3,where 𝑠1 = 𝐴1, 𝐴2, 𝐴3, 𝑠2 = 𝐴2, 𝐴3, 𝐴4, 𝐴5, and 𝑠3 =𝐴4, 𝐴5. The transaction relation 𝐼 ∪ 𝐼 ′ is shown below (where𝐼 ′ is shown in gray). The last column annotates the type of tuple(i.e., F for tuples in 𝐹 and L’ for tuples in 𝐿′).

A1 A2 A3 A4 A5

0 0 0 1 1 F1 0 0 0 0 F1 1 1 0 0 F1 1 1 1 1 L’

13

Page 14: Interactive Rule Refinement for Fraud Detectionmilo/projects/modas/papers/rudolf.pdf · demonstrate the generality of RUDOLF by applying it to other fraud domains, such as refining

It is straightforward to verify that the reduction to an instance ofthe Rule Modification Problem for Excluding Legitimate Transac-tions can be achieved in polynomial time in the size of the input ofthe Hitting Set Problem. We now show that a solution for the Mini-mum Hitting Set Problem implies a solution for the Rule Modifica-tion Problem for Excluding Legitimate Transactions tuple and viceversa.

Let 𝐻 be a minimum hitting set. For every element 𝑥𝑖 in 𝐻 , weduplicate the original rule (except if this is the last element in 𝐻)and modify the corresponding condition to 𝑎𝑖 = 0 in the copy ofthe rule. (The condition can also be 𝑎𝑖 ≤ 0 or 𝑎𝑖 < 1 but wlog weassume 𝑎𝑖 = 0 is used.) Recall that the cost of changing 𝑎𝑖 = 0is 1, and the cost of duplicating a rule is 1. Clearly, we have thatΦ′(𝐼 ∪ 𝐼 ′) contains 𝐹 . Indeed, since 𝐻 hits every set 𝑠 ∈ 𝑆,there must be an element in 𝑠 whose corresponding value under anattribute 𝑎𝑖 is 0 when the condition 𝑎𝑖 = 0 according to a new rulein Φ′. Hence, it follows that Φ′(𝐼 ∪ 𝐼 ′) contains 𝐹 and since eachrule in Φ′ contains a condition of the form 𝑎𝑖 = 0, the legitimatetransaction (1,1,1,...1) will not be among Φ′(𝐼 ∪ 𝐼 ′).

We show next that the expression cost(𝑀) − (𝛼 * ∆F + 𝛽 *∆L + 𝛾 * ∆R) is the minimum possible. Suppose there is anotherset 𝑀 ′ of modifications to Φ′′ such that cost(𝑀 ′)− (𝛼*∆F +𝛽 *∆L + 𝛾 * ∆R) is less than the previous expression. Observe thatevery rule in Φ′′ must contain at least a condition that is specific toselecting a fraudulent transaction. That is, for every rule, 𝑎𝑖 = 0 forsome 𝑖 since otherwise, the rule is either redundant or the legitimatetransaction will be selected. Also, we can assume that every othercondition in the rule in Φ′′ cannot contain a condition that selects1s (e.g., of the form 𝑎𝑖 = 1). If a rule 𝑟 contains a condition𝑎𝑖 = 1, then we can omit this condition and assume it is 𝑎𝑖 ≤ ⊤instead. The rule with 𝑎𝑖 ≤ ⊤ captures all tuples that are capturedby 𝑟 (and possibly more) and hence, we will continue to capture allfraudulent tuples and continue to exclude the legitimate tuple underthis assumption. Similarly, if a rule 𝑟 contains multiple conditions𝑎𝑖 = 0s, then we can omit all but one of the 𝑎𝑖 = 0s and assumethe rest are 𝑎𝑖 ≤ ⊤. We can now construct a hitting set from 𝑀 ′

that is smaller than 𝐻 , which is a contradiction.For the converse, let 𝑀 be the set of modifications made to Φ

such that 𝐹 ⊆ Φ′(𝐼 ∪ 𝐼 ′), the legitimate transaction 𝑙 is such that𝑙 ∈ Φ′(𝐼 ∪ 𝐼 ′), and cost(𝑀) − (𝛼 * ∆F + 𝛽 * ∆L + 𝛾 * ∆R) isminimum. As before, observe that each rule must contain at leastone modification of the form 𝑎𝑖 = 0 for some 𝑖 so that 𝑙 is notselected. Furthermore, it is easy to see that each rule must containexactly one such condition 𝑎𝑖 = 0 only as additional conditionssuch as 𝑎𝑗 = 1 or 𝑎𝑗 = 0 are redundant and can only increase thecost.

Let 𝐻 = 𝑥𝑖 | 𝑎𝑖 = 0 in any of the modified rules. We shownext that 𝐻 is a minimum hitting set. First, we show that 𝐻 isa hitting set. Suppose not, then there is a set 𝑠 ∈ 𝑆 such that𝐻∩𝑠 = ∅. In other words, for every element 𝑥𝑖 ∈ 𝑠, there does notexist a rule in Φ′ where 𝑎𝑖 = 0. Let 𝑓 be the tuple that correspondsto 𝑠 in the transaction table. This means that 𝑓 ∈ Φ′(𝐼∪𝐼 ′), whichcontradicts our assumption that 𝐹 ⊆ Φ′(𝐼 ∪ 𝐼 ′).

Next, we show that 𝐻 is a minimum hitting set. Suppose not,then there is another hitting set 𝐻 ′ where |𝐻 ′| < |𝐻|. With 𝐻 ′, itis straightforward to construct a set of modifications whose cost islower than 𝑀 ’s cost.

In our running example, Φ′ contains two rules:

𝐴1 ≤ ⊤ ∧𝐴2 = 0 ∧𝐴3 ≤ ⊤ ∧𝐴4 ≤ ⊤ ∧𝐴5 ≤ ⊤𝐴1 ≤ ⊤ ∧𝐴2 ≤ ⊤ ∧𝐴3 ≤ ⊤ ∧𝐴4 = 0 ∧𝐴5 ≤ ⊤

since a minimum hitting set is 𝐴2, 𝐴4.

In the proof above, observe that the original rule, which capturesthe legitimate tuple (1,1,...,1), is in effect specialized to avoid thelegitimate tuple. The specialization is carried out by excluding thevalue of an attribute of the legitimate tuple, e.g., 𝐴2. In the aboveproof, it happens that the domain consists of only two values 0 and1, and hence, one can simply specialize the rule by modifying theattribute to select the value that does not occur in the legitimatetuple (i.e., 𝐴2 = 0 in the above proof). In general, however, sincethe size of the domain is larger than 2, we specialize a rule whoseattribute 𝐴 has value 𝑣 by splitting it into two; one rule capturesall previous values up to but excluding 𝑣, and the other capturesthe remaining values excluding 𝑣. This is the underlying intuitionbehind the heuristic algorithm that we present in the paper.

THEOREM 4.5. The EXCLUDE LEGITIMATE TRANSACTIONSPROBLEM is NP-hard even if Φ is perfect for 𝐼 and the size of theschema of the transaction relation is fixed.

PROOF SKETCH. The proof of the above result is similar to thatof Theorem 4.2. It makes use of a reduction from the set coverproblem and in the reduction, a single unary transaction relation isused. We build a taxonomy of the elements of a set cover instanceaccording to which element belongs to which set. The relation ini-tially contains all elements of the universe of the set cover instanceand these transactions are all fraudulent. The set Φ consists of a sin-gle rule 𝐴 ≤ ⊤ which captures all fraudulent transactions. The costof adding a rule and modifying a condition cost 1 each. The newtransaction relation 𝐼 ′ consists of a single legitimate tuple whosevalue does not occur among the existing values. One can then es-tablish that a set of rules of minimum cost can be derived from aminimum set cover and vice versa. Intuitively, each rule has theform 𝐴 ≤ 𝑆𝑖 where each 𝑆𝑖 is part of the solution to the instanceof the minimum set cover problem.

14