Learning with Refutation

Journal of Computer and System Sciences 57, 356�365 (1998)

Learning with Refutation

Sanjay Jain*

School of Computing, National University of Singapore, Singapore 119260

Received June 18, 1997; revised April 22, 1998

In their pioneering work, Mukouchi and Arikawa modeled a learningsituation in which the learner is expected to refute texts which are notrepresentative of L, the class of languages being identified. Lange andWatson extended this model to consider justified refutation in whichthe learner is expected to refute texts only if it contains a finite sampleunrepresentative of the class L. Both the above studies were in thecontext of indexed families of recursive languages. We extend thisstudy in two directions. First, we consider general classes of recursivelyenumerable languages. Second, we allow the machine to either identifyor refute the unrepresentative texts (respectively, texts containing finiteunrepresentative samples). We observe some surprising differencesbetween our results and the results obtained for learning indexedfamilies by Lange and Watson. ] 1998 Academic Press

1. INTRODUCTION

Consider the identification of formal languages frompositive data. A text for a language is a sequential presenta-tion (in arbitrary order) of all and only the elements of thelanguage. In a widely studied identification paradigm, calledTxtEx-identification, a learning machine is fed texts forlanguages, and, as the machine is receiving the data, it out-puts a (possibly infinite) sequence of hypotheses. A learningmachine is said to TxtEx-identify a language L just in case,when presented with a text for L, the sequence of hypothesesoutput by the machine converges to a grammar for L (for-mal definitions of criteria of inference informally presentedin this section are given in Sections 2 and 3). A learningmachine TxtEx-identifies a class, L, of languages if itTxtEx-identifies each language in L. This model of iden-tification was introduced by Gold [Gol67] and has sincethen been explored by several researchers.

For the following, let L denote a class of languages whichwe want to identify. The model of identification presentedabove puts no constraint on the behaviour of the machineon texts for languages not in L. However, we may want amachine to be able to detect that it cannot identify an inputtext for at least two reasons. First, once a machine detectsthat it cannot identify an input text, we can use the machinefor other useful purposes. Second, we may employ anothermachine to identify the input text, so as to further enhance

the class of languages that can be identified. These are veryuseful considerations in the design of a practical learningsystem. Further, it is philosophically interesting to studymachines which know their limitations.

In their pioneering work, Mukouchi and Arikawa[MA93] modeled such a scenario. They required that inaddition to identifying all languages in L, the machineshould refute texts for languages not in L (i.e., texts whichare ``unrepresentative'' of L). We refer to this identificationcriterion as TxtRef. Mukouchi and Arikawa showed thatTxtRef constitutes a serious drawback on the learningcapabilities of machines. For example, a machine workingas above cannot identify any infinite language.1 This ledLange and Watson [LW94] (see also [MA95]) to considerjustified refutation in which they require a machine to refutea text iff some initial segment of the text is enough to deter-mine that the input text is not for a language in L; i.e., theinput text contains a finite sample ``unrepresentative'' of L.We call this criteria of learning TxtJRef. Lange and Watsonalso considered a modification of justified refutation model(called TxtJIRef, for immediate justified refutation) inwhich the machine is required to refute the input text assoon as the initial segment contains an unrepresentativesample (formal definitions are given in Section 3). Forfurther motivation regarding learning with refutation andits relationship with Popper's logic for scientific inference,we refer the reader to [MA93, LW94].

[MA93, LW94] were mainly concerned with learningindexed families of recursive languages, where thehypothesis space is also an indexed family. In this paper, weextend the study in two directions. First, we considergeneral classes of r.e. language and use the class of all com-puter programs (modeling accepting grammars) as thehypothesis space. Second, we allow a learning machine toeither identify or refute unrepresentative texts (texts con-taining finite unrepresentative samples). Note that in themodels of learning with refutation considered by [MA93,

Article No. SS981591

3560022-0000�98 �25.00Copyright � 1998 by Academic PressAll rights of reproduction in any form reserved.

* E-mail: sanjay�comp.nus.edu.sg.

1 A machine working as above, cannot refute a text for any subset of alanguage it identifies; this along with a result due to Gold [Gol67] (whichsays that no machine can TextEx-identify an infinite language and all of itsfinite subsets) shows that no machine can TxtRef-identify a class contain-ing an infinite language.

LW94] described above, the machine has to refute all textswhich contain samples unrepresentative of L. Thus, amachine which may identify some of these texts is dis-qualified.2 For learning general classes of r.e. languages wefeel that it is more reasonable to allow a machine to eitheridentify or refute such texts (in most applications identifyingan unrepresentative text is not going to be a disadvantage).This motivation has led us to the models described in thepresent paper. We refer to these criteria by attaching an E(for extended) in front of the corresponding criteria con-sidered by [MA93, LW94].

We now highlight some important differences in thestructure of results obtained by us, and those in [LW94]. Inthe context of learning indexed families of recursivelanguages, Lange and Watson (in their model; see also[MA95]) showed that TxtJIRef=TxtJRef (i.e., requiringmachines to refute as soon as the initial segment becomesunrepresentative of L is not a restriction). A similar resultwas also shown by them for learning from informants.3 Weshow that requiring immediate refutation is a restriction ifwe consider general classes of r.e. languages (in both our(extended) and Lange and Watson's models of justifiedrefutation and for learning from texts as well as informants).We also consider a variation of our model in which``unrepresentative'' is with respect to what a machine iden-tifies and not with respect to the class L. In this variation,for learning from texts (immediate) justified refutationmodel has the same power as TxtEx��a surprising result inthe context of results in [LW94] and other results in thispaper. However, in the context of learning from informants,even this variation fails to capture the power of InfEx (whichis a criterion of learning from informants; see Section 2).

We now proceed formally.

2. PRELIMINARIES

The recursion theoretic notions not explained below arefrom [Rog67]. N=[0, 1, 2, ...] is the set of all natural num-bers, and this paper considers r.e. subsets L of N. All con-ventions regarding range of variables apply, with or withoutdecorations,4 unless otherwise specified. We let c, e, i, j, k, l,m, n, p, s, t, u, v, w, x, y, z range over N. <, #, �, $, /, #

denote empty set, member of, subset, superset, proper sub-set, and proper superset, respectively. max( ), min( ), andcard( ) denote the maximum, minimum, and cardinality ofa set respectively, where by convention max(<)=0 andmin(<)=�. ( } , } ) stands for an arbitrary, one to one,

computable encoding of all pairs of natural numbers ontoN. Quantifiers \�, _�, and _! denote for all but finitelymany, there exist infinitely many, and there exists a unique,respectively.

R denotes the set of total recursive functions from N to N.f and g range over total recursive functions. E denotes theset of all recursively enumerable (r.e.) sets. L ranges over E.L� denotes the complement of set L (i.e., L� =N&L ). /L

denotes the characteristic function of set L. L1qL2 denotesthe symmetric difference of L1 and L2 , i.e., L1 qL2

=(L1&L2) _ (L2&L1). L ranges over subsets of E.. denotes a standard acceptable programming system(acceptable numbering) [Rog67]; .i denotes the functioncomputed by the ith program in the programming system ..We also call i a program or index for .i . For a (partial)function ', domain('), and range(') respectively denote thedomain and range of partial function '. We often write'(x) a ('(x) A ) to denote that '(x) is defined (undefined). Wi

denotes the domain of .i . Wi is considered as the languageenumerated by the ith program in . system, and we say thati is a grammar or index for Wi . 8 denotes a standard Blumcomplexity measure [Blu67] for the programming system.. Wi, s=[x<s | 8i (x)<s].

FIN denotes the class of finite languages, [L | card(L )<�]. INIT denotes the class of initial segments of N, thatis [[x | x<l ] | l # N]. L is called a single-valued totallanguage iff (\x)(_!y)[(x, y) # L]; svt=[L | L is a singlevalued total language]. If L # svt, then we say that Lrepresents the total function f such that L=[(x, f (x))| x # N]. K denotes the set [x | .x(x) a ]. Note that K is r.e.but K� is not.

A text is a mapping from N to N _ [*]. We let T rangeover texts. content(T) is defined to be the set of naturalnumbers in the range of T (i.e., content(T)=range(T )&[*]). T is a text for L iff content(T )=L. Thatmeans a text for L is an infinite sequence whose range,except for a possible *, is just L.

An infinite information sequence or informant is a map-ping from N to (N_[0, 1]) _ [*]. We let I range overinformants; content(I ) is defined to be the set of pairs in therange of I (i.e. content(I )=range(I )&[*]). By PosInfo(I )we denote the set [x | (x, 1) # content(I )]. By NegInfo(I )we denote the set [x | (x, 0) # content(I )]. For this paper,we only consider informants I such that PosInfo(I ) andNegInfo(I ) partition the set of natural numbers.

An informant for L is an informant I such thatPosInfo(I)=L. It is useful to consider canonical informa-tion sequence for L. I is a canonical information sequencefor L iff I(x)=(x, /L(x)). We sometimes abuse notation andrefer to the canonical information sequence for L by /L .

_, {, and # range over finite initial segments of texts orinformants, where the context determines which is meant.We denote the set of finite initial segments of texts by SEGand set of finite initial segments of informants by SEQ.

357LEARNING WITH REFUTATION

2 This property and the restriction to indexed families is crucially used inproving some of the results in [LW94].

3 An informant for a language L is a sequential presentation of theelements of the set [(x, 1) | x # L] _ [(x; 0) | x � L]; see formal definitionin Section 2.

4 Decorations are subscripts, superscripts, primes, and the like.

We define content(_)=range(_)&[*] and, for _ # SEQ,PosInfo(_)=[x | (x, 1) # content(_)], and NegInfo(_)=[x | (x, 0) # content(_)].

We use _PT (respectively, _PI, _P{) to denote that _is an initial segment of T (respectively, I, {); |_| denotes thelength of _. T[n] denotes the initial segment of T of lengthn. Similarly, I[n] denotes the initial segment of I of lengthn ; _ h { (respectively, _ h T , _ h I ) denotes the con-catenation of _ and { (respectively, concatenation of _ andT, concatenation of _ and I ). We sometimes abuse notationand say _ h w to denote the concatenation of _ with thesequence of one element w.

A learning machine (also called inductive inferencemachine) M is an algorithmic mapping from initial segmentsof texts (informants) to (N _ [?]). We say that M convergeson T to i (written M(T) a =i ), iff, for all but finitely manyn, M(T[n])=i. Convergence on informants is definedsimilarly.

We now present the basic models of identification fromtexts and informants.

Definition 1 [Gol67, CL82]. (a) M TxtEx-identifiestext T iff (_i | Wi=content(T ))[M(T ) a =i].

(b) M TxtEx-identifies L (written L # TxtEx(M)) iffM TxtEx-identifies each text T for L.

(c) M TxtEx-identifies L iff M TxtEx-identifies eachL # L.

(d) TxtEx=[L | (_M)[M TxtEx-identifies L]].

Definition 2 [Gol67]. (a) TxtExFin-identifies textT iff (_i | Wi=content(T ))(_n)[(\m<n)[M(T[m])=?]7 (\m�n)[M(T[m])=i]].

(b) M TxtFin-identifies L (written L # TxtFin(M)) iffM TxtFin-identifies each text T for L.

(c) M TxtFin-identifies L iff M TxtFin-identifies eachL # L.

(d) TxtFin=[L | (_M)[M TxtFin-identifies L]].

Intuitively, for finite identification, M outputs just onegrammar, which must be correct.

Definition 3 [Gol67, CL82]. (a) M, InfEx-identifiesinformant I iff (_i | Wi=PosInfo(I ))[M(I) a =i].

(b) M InfEx-identifies L (written L # InfEx(M)) iffM InfEx-identifies each informant I for L.

(c) M InfEx-identifies L iff M InfEx-identifies eachL # L.

(d) InfEx=[L | (_M)[M InfEx-identifies L]].

Definition 4 [Gol67]. (a) M InfFin-identifies infor-mant I iff (_i | Wi=PosInfo(I ))(_n)[(\m<n)[M(I[m])=?] 7 (\m�n)[M(I[m])=i]].

(b) M InfFin-identifies L (written L # InfFin(M)) iffM InfFin-identifies each informant I for L.

(c) M InfFin-identifies L iff M InfFin-identifies eachL # L.

(d) InfFin=[L | (_M)[M InfFin-identifies L]].

The next two definitions introduce reliable identification.A reliable machine diverges on texts (informants) it doesnot identify. Although a reliable machine does not refute atext (informant) it does not identify, it at least does not givefalse hope by converging to a wrong hypothesis. This wasprobably the first constraint imposed on machine'sbehaviour on languages outside the class being identified.We give two variations of reliable identification based onwhether the machine is expected to diverge on every textwhich is for a language not in L, or just on texts it does notidentify.

For the rest of the paper for criteria of inference, J, we willonly define what it means for a machine to J-identify a classof languages L. The identification class J is then implicitlydefined as J=[L | (_M)[M J-identifies L]].

Definition 5 [Min76]. (a) M TxtRel-identifies L iff

(a.1) M TxtEx-identifies L and

(a.2) (\T | content(T) � L)[M(T ) A ].

(b) M InfRel identifies L iff

(b.1) M InfEx-identifies L and

(b.2) (\i | PosInfo(I ) � L)[M(I) A ].

(c) M ETxtRel-identifies L iff

(c.1) M TxtEx-identifies L and

(c.2) (\T | M does not TxtEx-identify T )[M(T ) A ].

(d) M EInfRel-identifies L iff

(d.1) M InfEx-identifies L and

(d.2) (\I | M does not InfEx-identify I)[M(I ) A ].

The following propositions are some known facts aboutthe identification criteria discussed above, which we will beusing in this paper. First two propositions are based onresults due to Gold [Gol67].

Proposition 1. Suppose L is any infinite r.e. languageand M is a learning machine. Let _ be such that con-tent(_)�L. Then there exists an r.e. L$, content(_)�L$�Lsuch that M does not TxtEx-identify L$.

Proposition 2. Suppose L is any infinite r.e. languageand M is a learning machine. Let _ be such that PosInfo(_)�L. Then there exists an r.e. L$, PosInfo(_)�L$�L suchthat M does not InfEx-identify L$.

Proposition 3 [Gol67, Sha98]. TxtFin/InfFin/TxtEx/InfEx.

358 SANJAY JAIN

3. LEARNING WITH REFUTATION

In this section we introduce the refutation models forlearning. For learning with refutation we allow learningmachines to output a special refutation symbol denoted =.We assume that if M(_)==, then, for all {, M(_ h {)==.Intuitively output of = denotes that M is declaring the inputto be ``unrepresentative.'' In the following definitions weconsider the different criteria mentioned in the introduction.It is useful to define ConsL=[_ | (_L # L)[content(_)�L]].

The following definition introduces learning with refuta-tion for general classes of r.e. languages.

Definition 6 [MA93]. M TxtRef identifies L iff

(a) M TxtEx-identifies L and

(b) (\T | content(T ) � L)[M(T) a ==].

If M(T ) a ==, then we often say that M refutes the textT. The following definitions introduce identification withjustified refutation for general classes of r.e. languages.Below JRef stands for justified refutation, and JIRef standsfor justified immediate refutation.

Definition 7 [LW94]. M TxtJRef identifies L iff


(b) (\T | content(T ) � L and (__PT )[_ � ConsL])[M(T) a ==].

Intuitively, in the above definition, M is required to refutea text T only if T contains a finite sample which isunrepresentative of L. The following definition additionallyrequires that M refutes an initial segment of T as soon as itcontains an unrepresentative sample.

Definition 8 [LW94]. M TxtJIRef identifies L iff


(b) (\T | content(T ) � L and (\_PT )[_ � ConsL])[M(_)==].

We now present the above criteria for learning from infor-mants. It is useful to define the following analogue of Cons.IConsL=[_ | (_L # L)[PosInfo(_)�L7NegInfo(_)�L� ]].

Definition 9. (a) [MA93] M InfRef-identifies L iff

(a.1) M InfEx-identifies L and

(a.2) (\I | PosInfo(I ) � L)[M(T ) a ==].

(b) [LW94] M InfJRef identifies L iff


(b.2) (\I | PosInfo(I � L) and (__PI)[_ � IConsL])[M(I ) a ==].

(c) [LW94] M InfJIRef-identifies L iff

(c.1) M InfEx-identifies L and

(c.2) (\I | PosInfo(I) � L)(\_PI | _ � IConsL)[M(_)==].

We now present our extended definition for learning withrefutation. Intuitively, we extend the above definitions of[MA93, LW94] by allowing a machine to identify anunrepresentative text.

The following definition is a modification of the corre-sponding definition in [MA93]. E in the beginning of criteriaof inference, such as ETxtRef, stands for extended.

Definition 10. M ETxtRef identifies L iff


(b) (\T | M does not TxtEx-identify T )[M(T) a ==].

Intuitively, in the above definition we require the machineto refute the input text, only if it does not TxtEx-identify it.

The following definitions on identification by justifiedrefutation are modifications of corresponding definitionsconsidered by [LW94].

Definition 11. M ETxtJRef identifies L iff


(b) (\T | M does not TxtEx-identify T and (__PT )[_ � ConsL])[M(T )==].

Intuitively, in the above definition, M is required to refutea text T only if M does not identify T, and T contains a finitesample which is unrepresentative of L. In the followingdefinition, we additionally require that M refutes an initialsegment of T as soon as it contains an unrepresentativesample.

Definition 12 [LW94]. M ETxtJIRef identifies L iff


(b) (\T | M does not M TxtEx-identify T )(\_PT |_ � ConsL])[M(_)==].

We now present the above criteria for learning frominformants.

Definition 13. (a) M EInfRef-identifies L iff

(a.1) M InfEx-identifies L and

(a.2) (\I | M does not InfEx-identify I)[M(I) a ==].

(b) M EInfJRef identifies L iff


(b.2) (\I | M does not InfEx-identify I and (__PI )[_ � IConsL])[M(I) a ==].


(c) M EInfJIRef identifies L iff

(c.1) M InfEx-identifies L and

(c.2) (\i | M does not InfEx-identify I )(\_PI | _ �

IConsL)[M(_)==].

4. RESULTS

We next consider the relationship between different iden-tification criteria defined in this paper. The results presentedgive a complete relationship between all the criteria ofinference introduced in this paper.

4.1. Containment Results

The following containments follow immediately from thedefinitions.

Proposition 4. TxtRef�TxtRel�TxtEx.TxtRef�TxtJRef�TxtEx.TxtJIRef�TxtJRef�TxtEx.InfRef�InfRel�InfEx.InfRef�InfJRef�InfEx.InfJIRef�InfJRef�InfEx.TxtRef�InfRef�InfEx.TxtRel�InfRel�InfEx.

Proposition 5. ETxtRef�ETxtRel�TxtEx.ETxtRef�ETxtJRef�TxtEx.ETxtJIRef�ETxtJRef�TxtEx.EInfRef�EInfRel�InfEx.EInfRef�EInfJRef�InfEx.EInfJIRef�EInfJRef�InfEx.ETxtRef�EInfRef�InfEx.ETxtRel�EInfRel�InfEx.

Proposition 6. (a) TxtJIRef=ETxtJIRef.

(b) InfJIRef=EInfJIRef.

Proof. (a) It suffices to show ETxtJIRef�TxtJIRef.Suppose M ETxtJIRef-identifies L.

Claim 1. For all _ such that _ � ConsL , M(_)==.

Proof of claim. Suppose by way of contradiction thatthere is a _ such that _ � ConsL and M(_){=. Let T be anextension of _ such that M does not TxtEx-identify T. Notethat by Proposition 1 there exists such a T. But then bydefinition of ETxtJIRef, M(_)==. A contradiction. Thusclaim holds. K

Proof continued. It immediately follows from the claimthat M also TxtJIRef-identifies L. Part (b) can be provedin a manner similar to part (a). K

Theorem 1. Suppose X is a set not in 72 . Let L=[[i] |i # X]. Suppose J # [TxtRef, TxtRel, TxtJRef, InfRef,InfRel, InfJRef]. Then L # EJ but L � J.

Proof. It is easy to construct a machine which identifiesall texts for empty or singleton languages and refutes�diverges on all texts for languages containing at least twoelements. Thus, we have that L # EJ.

Now, suppose by way of contradiction that M J-identifiesL. Then, i # X iff (__ | content(_)=[i])(\{ | _P{ 7content({)=[i])[M(_)=M({) 7 M(_) # N]. A contradic-tion to the fact that X is not in 72 . K

Theorem 2. Suppose J # [TxtRef, TxtRel, TxtJRef,InfRef, InfRel, InfJRef]. Then, J/EJ.

Proof. J�EJ follows immediately from definition.Proper containment follows from Theorem 1. K

4.2. Separation Results

We now proceed to show the separation results. The nexttwo theorems show the advantages of finite identificationover reliable identification and identification with refuta-tion. For the proof of first theorem, we need the followingproposition, which follows immediately from definitions.

Proposition 7. Suppose L is such that:

(a) L # EInfJRef, and

(b) For all L1 , L2 # L, either L1 & L2=< or L1=L2 .

Then L # EInfRef (and thus in EInfRel).

Let M0 , M1 , ..., denote a recursive enumeration of allmachines.

Theorem 3. TxtFin&(EInfRel _ EInfJRef){<.

Proof. For each i, we will define below a nonemptylanguage Li with the following properties:

(a) Li �[(i, n) | n # N];

(b) either Mi is not reliable, or Li � InfEx(Mi).

(c) a grammar for Li can be obtained effectively in i.

We take L=[Li | i # N]. Clearly, L # TxtFin (since agrammar for Li can be found effectively from i ). Furtherusing clause (b) above, we have that L � EInfRel. It thusfollows from Proposition 7 that L � EInfJRef.

We will define Li in stages below. Let Lsi denote Li defined

before stage s. Let L0i =[(i, 0)]. Let x0=(i, 1) . Go to

stage 0.

Stage s.

1. Suppose, I s1 is the canonical information sequence for Ls

i

and Ls2 is the canonical information sequence for

Lsi _ [xs].

2. Search for n>xs such that either Mi (I s1[n]){

Mi (I s1[xs]) or M i (I s

2[n]){Mi (I s2[xs]).

If and when such an n is found proceed to step 3.

360 SANJAY JAIN

3. Let n be as found in step 2. If Mi (I s2[n]){Mi (I s

2[xs]),then let Ls+1

i =Lsi _ [xs]; otherwise let Ls+1

i =Lsi .

Let xs+1 # [(i, z) | z # N] be such that xs+1>n.Go to stage s+1.

End stage s

It is easy to verify that Li can be enumerated effectively ini. Fix i. We consider two cases in the definition of Li :

Case 1. There exist infinitely many stages. In this caseMi on canonical informant for Li makes infinitely manymind changes.

Case 2. Stage s starts but does not end. In this case Mi

converges to the same grammar for both Li and Li _ [xs](which are distinct languages). Thus Mi is not reliable.

The above cases show that L is not EInfRel-identifiedby Mi . K

Theorem 4. TxtFin&ETxtJRef{<.

Proof. The proof of this theorem is similar to that ofTheorem 3. For each i, we will define below a nonemptylanguage Li with the following properties:

(a) Li �[(i, n) | n # N];

(b) a grammar for Li can be obtained effectively in i.

We take L=[Li | i # N]. Clearly, L # TxtFin (since agrammar for Li can be found effectively from i ). Further, wewill have that Mi does not ETxtJRef-identity L.

We will define Li in stages below. Let Lsi denote Li defined

before stage s. Let L0i =[(i, 0)]. Let _0 be such that

content(_0)=L0i . Go to stage 0.

Stage s

1. Let T s1 , T s

2 be texts extending _s , for Lsi _ [(i, 2s+1)]

and Lsi _ [(i, 2s+2)], respectively.

2. Search for n>|_s | such that either Mi (T s1[n]){

Mi (_s) or Mi (T s2[n]){Mi (_s), or M i (T s

1[n])==,or Mi (T s

2[n])==.If and when such an n is found proceed to step 3.

3. Let n be as found in step 2. If Mi (T s2[n]){M i (_s) or

Mi (T s1[n])==, then let Ls+1

i =Lsi _ [2s+1], and

_=T s1[n]. Otherwise let Ls+1

i =Lsi _ [2s+2], and

_=T s2[n].

Let _s+1 be an extension of _ such thatcontent(_s+1)=Ls+1

i .Go to stage s+1.

End stage s

It is easy to verify that Li can be enumerated effectively ini. Fix i. We consider two cases in the definition of Li :

Case 1. There exist infinitely many stages. In this caseT=�s _s is a text for Li . However Mi either makesinfinitely many mind changes on T, or refutes T.

Case 2. Stage s starts but does not end. In this case notethat (i) content(T s

1) and content(T s2) are both finite; (ii) for

large enough n, T s1[n] and T s

2[n] are not in ConsL ; (iii) Mi

does not refute the texts T s1 and T s

2 ; and (iv) Mi fails toTxtEx-identify at least one of T s

1 and T s2 (since Mi

converges to the same grammar on both of them). Thus, Mi

does not ETxtJRef-identify L.The above cases show that L is not ETxtJRef-identified

by Mi . Since i was arbitrary, it follows that L � ETxtJRef. K

The following theorem shows the advantages of iden-tification with refutation over finite identifixation.

Theorem 5. (TxtRef & TxtJIRef & InfJIRef)&InfFin{<.

Proof. Let L=[L | card(L)�2]. It is easy to verifythat L witnesses the separation. K

The following theorem shows the advantages of justifiedrefutation and reliable identification over the case when thelearning machine has to refute all unidentified texts (inform-ants).

Theorem 6. (TxtRel & TxtJIRef & InfJIRef&InfRef{<.

Proof. It is easy to verify that FIN witnesses the separa-tion. K

The following theorem shows the disadvantages ofimmediate refutation. Note that for learning indexedfamilies of recursive languages, Lange and Watson haveshown that TxtJRef=TxtJIRef and InfJRef=InfJIRef,and thus the following result does not hold for learningindexed families of recursive languages.

Theorem 7. (a) TxtRef&EInfJIRef{<.

(b) TxtRef&ETxtJIRef{<.

Proof. Let L=[[i] | i � K]. It is easy to verify thatL # TxtRef. We show that L � ETxtJIRef. A similar proofalso shows that L � EInfJIRef. Suppose by way of con-tradiction that M ETxtJIRef-identifies L. Then the follow-ing claim shows that K� is r.e., a contradiction. ThusL � ETxtJIRef.

Claim 2. i # K� � (__ | content(_)=[i])[M(_) a {=].

Proof. Suppose i # K� . Then, since M TxtEx-identifies[i] # L, there must exist _ such that content(_)=[i] andM(_) a {=. On the other hand, suppose by way ofcontradiction that i � K� , and _ is such that content(_)=[i]and M(_) a {=. Let T be an extension of _ such that Mdoes not TxtEx-identify T (there exists such a T by


Proposition 1). But then, since a _ � ConsL , by definitionof ETxtJIRef, M(_) must be equal to =; a contradiction.This proves the claim, and completes the proof of thetheorem. K

In the context of learnability of indexed families, Langeand Watson (in their model of learning with refutation) hadshown that immediate refutation is not a restriction, i.e.TxtJRef=TxtJIRef and InfJRef=InfJIRef. The followingcorollary shows that immediate refutation is a restriction inthe context of learning general classes of r.e. languages!Note that this restriction holds for both extended and unex-tended models of justified refutation (for general classes ofr.e. languages).

Corollary 1. (a) InfJIRef/InfJRef.

(b) TxtJIRef/TxtJRef.

(c) EInfJIRef/EInfJRef.

(d) ETxtJIRef/ETxtJRef.

The following theorem shows the advantages of justifiedrefutation over reliable identification.

Theorem 8. (TxtJIRef & InfJIRef)&EInfRel{<.

Proof. For f # R, let Lf =[(x, y) | f (x)= y]. Let L=[Lf | .f (0) = f ] _ [L | L # FIN 7 (_x, y, z | y{z)[(x, y)# L 7 (x, z) # L]]. It is easy to verify that L # TxtEx (andthus InfEx). Since, IConsL=SEQ, it follows that L #TxtJIRef & InfJIRef. Essentially the proof in [CJNM94]to show that [ f | .f (0)= f ] cannot be identified by a reliablemachine (for function learning) translates to show thatL � EInfRel. K

The following theorem shows the advantages of reliableidentification over justified refutation.

Theorem 9. (a) TxtRel&ETxtJRef{<.

(b) TxtRel&EInfJRef{<.

Proof. (a) Let Li=[(i, x) | x # N]. Let Xi=[_ #SEG | </content(_)�Li 7 Mi (_)==]. Note that Xi isrecursively enumerable (a grammar for which can be foundeffectively in i ).

Let L=[content(_) | (_i )[_ # Xi]]. It is easy to verifythat L # TxtRel. Suppose by way of contradiction that Mi

witnesses that L # ETxtJRef. We consider two cases:

Case 1. Xi is not empty. In this case let _ # Xi . Let T bea text for content(_) that extends _. Now content(T) # L,but Mi refutes T.

Case 2. Xi is empty. In this case, L does not containany subset of Li . Let L be a nonempty subset of Li such thatMi does not TxtEx-identify L (by Proposition 1 there mustexist such an L ). Let T be a text for L. Now Mi does not

TxtEx-identify T, nor does it refute T. However, forall initial segments _ of T such that content(_){<,_ � ConsL .

From the above cases, it follows that Mi does notETxtJRef-identify L. Thus L � ETxtJRef.

(b) Let Li be as defined in part (a). Let Xi=[_ # SEQ |</PosInfo(_)�Li 7 Mi(_)==]. Let L=[PosInfo(_) |(_i )[_ # Xi]]. It can now be proved in a manner similar topart (a) that L # TxtRel&EInfJRef. K

The following theorem shows the advantages of havingan informant over texts.

Theorem 10. (InfRef & InfJIRef)&TxtEx{<.

Proof. INIT _ [N] witnesses the separation. K

5. A VARIATION OF EXTENDED JUSTIFIEDREFUTATION CRITERIA

In the definitions for (extended) criteria of learning withjustified refutation, we required the machines to either iden-tify or (immediately) refute any text (informant) which didnot contain a finite sample representative of the class, L,being learned. For example, in ETxtJRef-identification werequired that the machine either identify or refute every textwhich starts with an initial segment not in ConsL . Alter-natively, we could place such a restriction only for textswhich are not representative of what the machine identifies(note that this gives more freedom to the machine). Inother words, in the definitions for ETxtJRef, ETxtJIRef,EInfJRef, EInfJIRef, we could have taken [T[n] | n # Nand M TxtEx-identifies T ], instead of ConsL and [I[n] |n # N and M InfEx-identifies I], instead of IConsL . Letthese new classes formed be called ETxtJRef$, EInfJRef$,ETxtJIRef$, EInfJIRef$. Note that a similar change doesnot effect the classes ETxtRef, ETxtRel, EInfRef, EInfRel.

An easy to show interesting property of the classesETxtJRef$, EInfJRef$, ETxtJIRef$, EInfJIRef$ is that theyare closed under subset operation (i.e., if L # ETxtJRef$,then every L$�L is in ETxtJRef$). Note that ETxtJIRef,ETxtJRef, EInfJIRef, EInfJRef are not closed under subsetoperation��this follows immediately from Theorem 9 andthe fact that FIN belongs to each of these inference criteria.

We now show a result that ETxtJIRef$ and ETxtJRef$obtain the full power of TxtEx! This is a surprising result,given the results in [LW94] and this paper (ETxtJRef andEInfJRef do not even contain TxtFin, as shown inTheorem 3 and Theorem 4).

Theorem 11. (a) TxtEx=ETxtJRef$=ETxtJIRef$.

(b) EInfJRef$=EInfJIRef$.

Proof. For part (a), it is enough to show thatTxtEx�ETxtJIRef$. Consider any class L # TxtEx. If

362 SANJAY JAIN

N # L, then it immediately follows that L # ETxtJIRef$(since ConsL=SEG). If N � L, then let L$=L _ INIT. Itwas shown by Fulk [Ful90] that, if L # TxtEx and L doesnot contain N, then L$ as defined above is in TxtEx. SinceL$ contains a superset of every finite set, it follows thatL$ # ETxtJIRef$ (since ConsL$=SEG). Part (a) nowfollows using the fact that ETxtJIRef$ is closed under subsetoperation.

(b) It is sufficient to show that EInfJRef$�EInfJIRef$.Suppose M EInfJRef$-identifies L. We construct an M$which EInfJIRef$-identifies L. Let g denote a recursivefunction such that, for all finite sets S, Wg(S)=S. On anyinput I[n], M$ behaves as follows. If M(I[n]){=, thenM$(I[n])=M(I[n]) (this ensures that M$ InfEx-identifiesL). If M(I[n])==, then let m be the smallest number suchthat M(I[m])==. If PosInfo(I[n])=PosInfo(I[m]), thenM$(I[n]) outputs g(PosInfo(I[n])). Otherwise M$(I[n])outputs =. We claim that for every _, either M$(_)== orthere exists an extension I of _ such that M$ InfEx-identifiesI. So suppose M$(_){=. We consider the following cases.

Case 1. M(_){=. If M InfEx-identifies some exten-sion of _, then clearly M$ does too. So suppose that M doesnot InfEx-identify any extension of _. This implies that Mrefutes every informant which begins with _. Let I be aninformant, extending _, for PosInfo(_). Let n be the leastnumber such that M(I[n])==. Note that I[n] must be anextension of _. It now follows from the definition of M$ thatM$ InfEx-identifies I.

Case 2. M(_)==. Let { be the smallest prefix of _ suchthat M$({)==. It follows from the definition of M$ thatPosInfo({)=PosInfo(_). Let I, extending _, be an inform-ant for PosInfo(_). It follows from the definition of M$ thatM$(I ) is a grammar for PosInfo({)=PosInfo(I).

From the above cases, it follows that M$ EInfJIRef$-identifies L. K

However, unlike the case for texts, EInfJRef$ is not equalto InfEx.

Theorem 12. TxtEx&EInfJRef${<.

Proof. First we prove the following lemma. Note that ifM EInfJRef$-identities L, then for each finite informationsequence _, it must satisfy either (a) or (b) in the statementof lemma.

Lemma 1. Suppose M is such that for all finite informa-tion sequences _, at least one of the following two propertiesis satisfied:

(a) M InfEx-identifies some informant extending _, or

(b) M refutes all infinite information sequences I extend-ing _.

Then, for every finite information sequence {, one can find(effectively in { and M) a grammar g({, M) such that Wg({, M)

$PosInfo({), and either

(c) There exists a # extending { such that M(#)==, or

(d) M diverges on some informant for Wg({, M) .

Proof. Suppose M is given such that for each finiteinformation sequence _, either (a) or (b) holds. Let { besuch that (c) fails. Then, we claim that, for each # extending{, there exists a #$$$ extending # such that M(#){M(#$$$). Tosee this, consider any # extending {. Let #$ and #" be twoextensions of # such that PosInfo(#$) & NegInfo(#"){<(i.e. #$ and #" are inconsistent with each other). Now, since(b) is not true for _=#$ or _=#", we have from (a) thatM InfEx-identifies some informant extending #$ as well assome informant extending #". Thus, there exists a #$$$ extend-ing # such that M(#){M(#$$$). Hence, for every # extending{, there exists a #$$$ extending # such that M(#){M(#$$$).This is what our construction of g will utilize. We give belowa description of Wg({, M) (note that g can be defined usings-m-n theorem).

Wg({, M)

1. Let {0={.Enumerate PosInfo({) into Wg({, M) .Go to stage 0.

2. Stage sSearch for # extending {s such that M(#){M({s).Let {s+1 be an extension of # such that

PosInfo({s+1) _ NegInfo({s+1)$[x | x�s].Enumerate PosInfo({s+1) in Wg({, M) .Go to stage s+1.

End stage sEnd

Note that if M is as given in the lemma and { is such that(c) fails, then, based on analysis above, there will beinfinitely many stages in the construction of Wg({, M) . Thus,�i {i is an informant for Wg({, M) on which M makesinfinitely many mind changes. This proves the lemma. K

We now proceed with the proof of the theorem.Let {i denote a finite information sequence (obtained

effectively from i ) such that PosInfo({i)=[i+1] andNegInfo({i)=[x | x�i]. Let S i=[# | {i P# 7 Mi (#)==].Let L=[Wg({i, Mi)

| Si=<] _ [PosInfo(#) | # # S i].It is easy to verify that L # TxtEx. However, if S i is not

empty, then Mi refutes # # S i , even though PosInfo(#) # L.If Si is empty, then either Mi does not behave properly (i.e.,Mi , for some _, fails to satisfy both conditions (a) and (b)in the statement of Lemma 1) or does not InfEx-identifyWg({i, Mi)

# L. It follows that L � EInfJRef$. K


Corollary 2. TxtJIRef&EInfJRef${<.

Proof. Note that no language in the class L defined inthe proof of Theorem 12 contains 0. Thus, L _ [N] #TxtEx. However, L _ [N] � EInfJRef$ using proof identi-cal to the proof of Theorem 12. K

The following theorem, however, shows that EInfJRef$contains InfRel.

Theorem 13. InfRel�EInfJIRef$.

Proof. Note that InfRel is closed under finite union.Also FIN # InfRel. Now suppose L # InfRel. Thus,(L _ FIN) # InfRel�InfEx. It follows that (L _ FIN) #EInfJIRef$ (since IConsL _ FIN=SEQ). Now, EInfJIRef$ isclosed under subset operation, and thus it follows that L #EInfJIRef $. K

6. CONCLUSIONS

Mukouchi and Arikawa modeled a learning situation inwhich the learner is expected to refute texts which are notrepresentative of L, the class of languages being identified.Lange and Watson extended this model to consider justifiedrefutation in which the learner is expected to refute textsonly if it contains a finite sample unrepresentative of theclass L. Both the above studies were in the context ofindexed families of recursive languages. In this paper weextended this study in two directions. First, we consideredgeneral classes of recursively enumerable languages.Second, we allowed the machine to either identify or refutethe unrepresentative texts (respectively, texts containingfinite unrepresentative samples). We observed some surpris-ing differences between our results and the results obtainedfor learning indexed families by Lange and Watson. Forexample, in the context of learning indexed families of recur-sive languages, Lange and Watson (in their model) showedthat TxtJIRef=TxtJRef (i.e., requiring machines to refuteas soon as the initial segment becomes unrepresentative ofL is not a restriction) A similar result was also shown bythem for learning from informants. We showed that requir-ing immediate refutation is a restriction if we considergeneral classes of r.e. languages (in both our (extended) andLange and Watson's models of justified refutation and forlearning from texts as well as informants). We also con-sidered a variation of our model in which `ùnrepresen-tative'' is with respect to what a machine identifies and notwith respect to the class L. In this variation, for learningfrom texts, (immediate) justified refutation model has thesame power as TxtEx��a surprising result in the context ofresults in [LW94] and other results in this paper. However,in the context of learning from informants, even this varia-tion fails to capture the power of InfEx.

It would be useful to find interesting characterizations ofthe different inference classes studied in this paper. Ananonymous referee suggested the following problems. Whenwe do not require immediate refutation (as in TxtJRef) thedelay in refuting the text may be arbitrarily large. It wouldbe interesting to study any hierarchy that can be formed by``quantifying'' the delay. Note that if one just considers thenumber of excess data points needed before refuting, thenthe hierarchy collapses��except for the V (unbounded butfinite) case. As extensions of criteria considered in thispaper, one could consider the situation when a machineapproximately identifies [KY95, KY97, Muk94] a text T inthe cases when it does not identify or refute a text.

ACKNOWLEDGMENTS

We thank Arun Sharma for helpful discussions and comments.Anonymous referees provided several comments which improved thepresentation of this paper.

REFERENCES

[Blu67] M. Blum, A machine-independent theory of the complexity ofrecursive functions, Assoc. Comput. Mach. 14 (1967),322�336.

[CJNM94] J. Case, S. Jain, and S. Ngo Manguelle, Refinements ofinductive inference by Popperian and reliable machines,Kybernetika 30 (1994), 12�52.

[CL82] J. Case and C. Lynes, Machine inductive inferenceand language identification, in ``Proceedings of the 9thInternational Colloquium on Automata, Languages andProgramming'' (M. Nielsen, and E. M. Schmidt, Eds.),Lecture Notes in Computer Science, Vol. 140, pp. 107�115,Springer-Verlag, New York�Berlin, 1982.

[Ful90] M. Fulk, Prudence and other conditions on formal languagelearning, Inform. and Comput. 85 (1990), 1�11.

[Gol67] E. M. Gold, Language identification in the limit, Inform. andControl 10 (1967), 447�474.

[KY95] S. Kobayashi and T. Yokomori, On approximately identify-ing concept classes in the limit, in `Àlgorithmic LearningTheory: Sixth International Workshop (ALT '95),'' LectureNotes in Artificial Intelligence, Vol. 997, pp. 298�312,Springer-Verlag, New York�Berlin, 1995.

[KY97] S. Kobayashi and T. Yokomori, Learning approximatelyregular languages with reversible languages, Theor. Comput.Sci. A 174 (1997), 251�257.

[LW94] S. Lange and P. Watson, Machine discovery in the presenceof incomplete or ambiguous data, in `Àlgorithmic learningtheory: Fourth International Workshop on Analogical andInductive Inference (AII '94) and Fifth InternationalWorkshop on Algorithmic Learning Theory (ALT '94)''(S. Arikawa, and K. Jantke, Eds.), Lecture Notes in ArtificialIntelligence, Vol. 872, pp. 438�452, Springer-Verlag, NewYork�Berlin, 1994.

[MA93] Y. Mukouchi and S. Arikawa, Inductive inference machinesthat can refute hypothesis spaces, in `Àlgorithmic learningtheory: Fourth International Workshop (ALT '93)''(K. Jantke, S. Kobayashi, E. Tomita, and T. Yokomori, Eds.),Lecture Notes in Artificial Intelligence, Vol. 744, pp. 123�136,Springer-Verlag, New York�Berlin, 1993.

364 SANJAY JAIN

[MA95] Y. Mukouchi and S. Arikawa, Towards a mathematicaltheory of machine discovery from facts, Theor. Comput. Sci. A137 (1995), 53�84.

[Min76] E. Minicozzi, Some natural properties of strong identifi-cation in inductive inference, Theor. Comput. Sci. 2 (1976),345�360.

[Muk94] Y. Mukouchi, Inductive inference of an approximate conceptfrom positive data, in ``Algorithmic Learning Theory: FourthInternational Workshop on Analogical and Inductive

Inference (AII '94) and Fifth International Workshop onAlgorithmic Learning Theory (ALT '94),'' Lecture Notes inArtificial Intelligence, Vol. 872, pp. 484�499, Springer-Verlag,New York�Berlin, 1994.

[Rog67] H. Rogers, ``Theory of Recursive Functions and EffectiveComputability,'' McGraw�Hill, New York, 1967. [Reprintedby MIT Press in 1987]

[Sha98] A. Sharma, A note on batch and incremental learnability,J. Comput. System Sci., to appear.


Learning with Refutation

Documents

Transcript of Learning with Refutation