On some open problems in reflective inductive inference

4
Information Processing Letters 109 (2009) 208–211 Contents lists available at ScienceDirect Information Processing Letters www.elsevier.com/locate/ipl On some open problems in reflective inductive inference Sanjay Jain 1 School of Computing, National University of Singapore, Singapore 117590 article info abstract Article history: Received 20 May 2008 Received in revised form 18 September 2008 Accepted 7 October 2008 Available online 17 October 2008 Communicated by L.A. Hemaspaandra Keywords: Theory of computation Inductive inference Reflection Consistency In this paper we show that there exist classes of functions which can be learnt by a finite learner which reflects on its capability, but not learnable by a consistent learner which optimistically reflects on its capability. This solves the two mentioned open problems from [G. Grieser, Reflective inductive inference of recursive functions, Theoretical Computer Science A 397 (1–3) (2008) 57–69 (Special Issue on Forty Years of Inductive Inference. Dedicated to the 60th Birthday of Rolf Wiehagen)]. © 2008 Elsevier B.V. All rights reserved. 1. Introduction A function learning criterion can be described as fol- lows. A learning machine M (a computable device) receives growing segments σ 0 , σ 1 ,... of the graph of a function f . During this process, the learner M outputs a sequence of conjectures p 0 , p 1 ,... . The learner is said to identify f if the sequence of conjectures, as above, converges to a program for f . This is essentially the notion of learn- ing in the limit, also called Ex learning, as introduced by Gold [5]. Note that in the above process, the learner does not know if and when it has converged to its final hy- pothesis. If we impose this additional requirement, then the model is equivalent to requiring that the learner in the above process output only one conjecture (which is a correct program for the input function). This model of learning is known as finite learning (Fin-learning) [5]. The intermediate programs output by the learner in the Ex-learning model may not be consistent with the input data. Consistency is the requirement that each hypothesis conjectured by the learner, after seeing input σ , must con- tain the data in σ [2,3,15,8]. It is known that consistency is E-mail address: [email protected]. 1 Supported in part by NUS grant number R252-000-308-112. a restriction, that is, some Ex-learnable class of functions cannot be learnt by a consistent learner [2]. The above models of learning, and various modifica- tions, have been extensively studied in the literature (see for example, [1,9,14,16,17]). In this paper we will restrict ourselves to the above mentioned criteria. Formal defini- tions of the above learning models, as well as of the mod- els of reflection considered below, are given in Section 2. The criteria considered above specify how the learner behaves on the functions in the class to be learnt, but specify nothing about learner’s ability to indicate potential non-learnability of functions outside the class being learnt. This may be considered as a weakness from a user’s point of view: the learner may lead the user into believing that learning is taking place, even though the conjectures of the learner may be completely wrong. There have been several proposals in the literature about how a learner may recognize its own limitations. For the following fix a criterion of learning. Reliable learn- ing [3,11] requires that the learner does not converge to a final conjecture on sequences for functions not learnt by the learner (that is, the learner signals incorrectness of its hypothesis by making a mind change). Mukouchi and Arikawa [10] consider the model in which a learner 0020-0190/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.ipl.2008.10.006

Transcript of On some open problems in reflective inductive inference

Information Processing Letters 109 (2009) 208–211

Contents lists available at ScienceDirect

Information Processing Letters

www.elsevier.com/locate/ipl

On some open problems in reflective inductive inference

Sanjay Jain 1

School of Computing, National University of Singapore, Singapore 117590

a r t i c l e i n f o a b s t r a c t

Article history:Received 20 May 2008Received in revised form 18 September2008Accepted 7 October 2008Available online 17 October 2008Communicated by L.A. Hemaspaandra

Keywords:Theory of computationInductive inferenceReflectionConsistency

In this paper we show that there exist classes of functions which can be learnt by a finitelearner which reflects on its capability, but not learnable by a consistent learner whichoptimistically reflects on its capability. This solves the two mentioned open problemsfrom [G. Grieser, Reflective inductive inference of recursive functions, Theoretical ComputerScience A 397 (1–3) (2008) 57–69 (Special Issue on Forty Years of Inductive Inference.Dedicated to the 60th Birthday of Rolf Wiehagen)].

© 2008 Elsevier B.V. All rights reserved.

1. Introduction

A function learning criterion can be described as fol-lows. A learning machine M (a computable device) receivesgrowing segments σ0, σ1, . . . of the graph of a function f .During this process, the learner M outputs a sequenceof conjectures p0, p1, . . . . The learner is said to identifyf if the sequence of conjectures, as above, converges toa program for f . This is essentially the notion of learn-ing in the limit, also called Ex learning, as introduced byGold [5]. Note that in the above process, the learner doesnot know if and when it has converged to its final hy-pothesis. If we impose this additional requirement, thenthe model is equivalent to requiring that the learner inthe above process output only one conjecture (which isa correct program for the input function). This model oflearning is known as finite learning (Fin-learning) [5].

The intermediate programs output by the learner in theEx-learning model may not be consistent with the inputdata. Consistency is the requirement that each hypothesisconjectured by the learner, after seeing input σ , must con-tain the data in σ [2,3,15,8]. It is known that consistency is

E-mail address: [email protected] Supported in part by NUS grant number R252-000-308-112.

0020-0190/$ – see front matter © 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.ipl.2008.10.006

a restriction, that is, some Ex-learnable class of functionscannot be learnt by a consistent learner [2].

The above models of learning, and various modifica-tions, have been extensively studied in the literature (seefor example, [1,9,14,16,17]). In this paper we will restrictourselves to the above mentioned criteria. Formal defini-tions of the above learning models, as well as of the mod-els of reflection considered below, are given in Section 2.

The criteria considered above specify how the learnerbehaves on the functions in the class to be learnt, butspecify nothing about learner’s ability to indicate potentialnon-learnability of functions outside the class being learnt.This may be considered as a weakness from a user’s pointof view: the learner may lead the user into believing thatlearning is taking place, even though the conjectures of thelearner may be completely wrong.

There have been several proposals in the literatureabout how a learner may recognize its own limitations.For the following fix a criterion of learning. Reliable learn-ing [3,11] requires that the learner does not converge toa final conjecture on sequences for functions not learntby the learner (that is, the learner signals incorrectnessof its hypothesis by making a mind change). Mukouchiand Arikawa [10] consider the model in which a learner

S. Jain / Information Processing Letters 109 (2009) 208–211 209

is required to refute sequences for functions which are notlearnt by the learner.

Jantke [7] and later Grieser [6] considered learners thatcan reflect on their own capabilities. Such learners areequipped with a reflection function that indicates, in thelimit, whether the input data represents a function in theclass being learnt. Let R be a reflection function belongingto some learner. For each finite piece of input data σ , thelearner is said to accept σ if R(σ ) = 1; the learner is saidto reject σ if R(σ ) = 0. Grieser [6] considered three differ-ent models of reflection, constraining the behavior of R onfinite pieces σ of input data.

(1) Exact reflection: If the learner accepts σ , then σ be-longs to some function in the class being learnt; if thelearner rejects σ , then σ does not belong to any func-tion in the class being learnt.

(2) Pessimistic reflection: If the learner accepts σ , then σbelongs to some function in the class being learnt;however, nothing can be said about σ if σ is rejected.

(3) Optimistic reflection: If the learner rejects σ , then σdoes not belong to any function in the class beinglearnt; however, nothing can be said about σ if σ isaccepted.

Grieser showed several interesting relationships be-tween the above models of learning with reflection by con-sidering how they affect Ex, Fin and consistent learning.In his study, Grieser left open several problems, includ-ing whether there exists a class of functions which canbe finitely identified by a reflecting machine (without anyconstraints on when the input data is rejected), but whichcannot be identified by a consistent learner that optimisti-cally reflects on the input data. In this note we solve theabove problem (and others) by constructing such a class offunctions.

2. Notation and preliminaries

Let N denote the set of natural numbers, {0,1,2, . . .}.Let f , with or without decorations, range over total func-tions. Let R denote the class of all recursive functions,i.e., total computable functions with arguments and val-ues from N . Let C , with or without decorations, range oversubsets of R.

Let ϕ0,ϕ1, . . . be a fixed acceptable numbering for allthe partial recursive functions [12]. Let Φ denote a stan-dard Blum complexity measure [4] for ϕ . By η(x)↓ wedenote that the partial function η is defined on input x.

A sequence is a mapping from N or an initial segmentof N to N × N . The content of a sequence σ , denotedcontent(σ ), is the set of pairs appearing in the range of σ .Let SEQ denote the set of all finite sequences σ such that if(x, y) and (x, z) belong to content(σ ), then y = z. (In thispaper, we are only interested in sequences whose contentrepresents part of a function. Furthermore we are only in-terested in the learning of total recursive functions.) If thesequence σ contains at least n elements, then we let σ [n]denote the first n elements of the sequence σ . The canon-ical sequence for a function f is the infinite sequence σsuch that σ(n) = (n, f (n)).

Concatenation of two finite sequences σ and τ is de-noted by σ � τ .

A learning machine (also called a learner) is a (possi-bly partial) computable mapping from SEQ to N ∪ {?}. Welet M , with or without decorations, range over learningmachines. We say that a learner M converges on an infinitesequence σ to i iff for all but finitely many n, M(σ [n]) = i.

We now formally define the criteria of learning consid-ered in the introduction.

Definition 1. (See [5].)

(a) M Ex-identifies a function f (written: f ∈ Ex(M)) ifffor all infinite sequences σ such that content(σ ) ={(x, f (x)): x ∈ N}:(i) M(σ [n])↓, for all n ∈ N , and

(ii) there exists an i such that ϕi = f and, for all butfinitely many n, M(σ [n]) = i.

(b) M Ex-identifies a class C of functions (written: C ⊆Ex(M)) iff M Ex-identifies each f ∈ C .

(c) Ex = {C : (∃M)[M Ex-identifies C]}.

Definition 2. (See [5].)

(a) M Fin-identifies a function f (written: f ∈ Fin(M)) ifffor all infinite sequences σ such that content(σ ) ={(x, f (x)): x ∈ N}:(i) M(σ [n])↓, for all n ∈ N , and

(ii) there exist i,n ∈ N such that ϕi = f , M(σ [m]) =?for m < n, and M(σ [m]) = i for m � n.

(b) M Fin-identifies a class C of functions (written: C ⊆Fin(M)) iff M Fin-identifies each f ∈ C .

(c) Fin = {C : (∃M)[M Fin-identifies C]}.

For Ex and Fin criteria of learning, one may assumewithout loss of generality that the learner is total. How-ever, for consistent learning this is not the case.

Definition 3. (See [2].)

(a) M is said to be consistent on f iff, for all σ ∈ SEQsuch that content(σ ) ⊆ f , M(σ )↓ and for all (x, y) ∈content(σ ), ϕM(σ )(x) = y.

(b) M Cons-identifies f (written: f ∈ Cons(M)) iff M Ex-identifies f and M is consistent on f .

(c) M Cons-identifies C (written: C ⊆ Cons(M)) iff M Ex-identifies C and M is consistent on each f ∈ C .

(d) Cons = {C : (∃M)[M Cons-identifies C]}.

There are other notions of consistency considered inthe literature (a) [3,15] T Cons, in which the learner is re-quired to be consistent on all the total functions and (b) [8]RCons, in which the learner is required to be total (how-ever, here the learner is required to be consistent only onfunctions in the class being learnt).

Definition 4. (See [6].) Fix a learner M and a criterion oflearning I.

(a) A finite sequence σ is acceptable (for M with respectto I) if content(σ ) ⊆ f for some f ∈ I(M). A finite se-

210 S. Jain / Information Processing Letters 109 (2009) 208–211

quence σ is unacceptable (for M with respect to I) ifcontent(σ ) �⊆ f for all f ∈ I(M).

(b) An infinite sequence σ is acceptable (for M with re-spect to I) if content(σ ) = {(x, f (x)): x ∈ N}, for somefunction f ∈ I(M). An infinite sequence σ is unaccept-able (for M with respect to I) if it has an initial seg-ment which is unacceptable.

Note that an infinite sequence may be neither accept-able nor unacceptable.

Definition 5. (See [6,7,10].) Fix a learner M and a criterionof learning I.

(a) A reflection function R is a total computable functionfrom SEQ to {0,1}.

(b) R is said to be a reflection function of M with respect tothe learning criterion I iff(i) for all infinite acceptable sequences σ (for M with

respect to I), R converges on σ to 1;(ii) for all infinite unacceptable sequences σ (for M

with respect to I), R converges on σ to 0.(c) R is said to be a pessimistic reflection function (or p-

reflection function) of M with respect to the learningcriterion I iff R is a reflection function of M with re-spect to the learning criterion I and for all σ ∈ SEQsuch that R(σ ) = 1, σ is acceptable.

(d) R is said to be an optimistic reflection function (or o-re-flection function) of M with respect to the learningcriterion I iff R is a reflection function of M with re-spect to the learning criterion I and for all σ ∈ SEQsuch that R(σ ) = 0, σ is unacceptable.

(e) R is said to be an exact reflection function (or e-re-flection function) of M with respect to the learningcriterion I iff R is a reflection function of M with re-spect to the learning criterion I and for all σ ∈ SEQ,R(σ ) = 1 iff σ is acceptable.

(f) M , along with reflection function R , I-Refl (respec-tively, I-pRefl, I-oRefl, I-eRefl) identifies C if M I-iden-tifies C and R is a reflection function (respectively,pessimistic reflection function, optimistic reflectionfunction, exact reflection function) of M with respectto the learning criterion I.

(g) I-Refl (respectively, I-pRefl, I-oRefl, I-eRefl) denotesthe set of classes of functions C which can be I-Refl(respectively, I-pRefl, I-oRefl, I-eRefl) identified bysome learner M , along with a corresponding reflec-tion function R .

Note that the acceptable and reflection notions are withrespect to the class learnt by the learner M , and not withrespect to the target class C .

Note that the notion of learning considered in the abovedefinitions allow the input to be an arbitrary ordering ofthe graph of the function being learnt. Researchers haveconsidered the situation in which only canonical sequencesare given to the learner. For Ex and Fin criteria this doesnot make a difference, though it does make a differencefor consistent learning (see, for example, [3,16,6]). More-over, for optimistic, pessimistic or exact reflection (for any

of the criteria of learning considered in this paper), consid-ering only canonical sequences does make a difference (see[6]). We would not go into details of this, except mention-ing that the results obtained hold in the strongest form:for the positive side, the learning can be done for arbitraryinput sequences, whereas for the negative side, learningcannot be done even on canonical sequences.

3. Main result

Theorem 6. Fin-Refl − Cons-oRefl �= ∅.

Proof. Let

C1 = {f : f (0) > 0, ϕ f (0) = f and

(∀y > 0)(∀x < y)[Φ f (0)(x) < f (y)

]}

and

C2 = {f : f (0) > 0, ϕ f (0) = f and (∃n > 0)

[(∀x < n)

[f (x) > 0

]and f (n) = 0 and

(∀y > n)(∀x < y)[Φ f (0)(x) < f (y)

]]}.

Let C = C1 ∪ C2.Using Kleene’s parameterized recursion theorem [13]

we will define a recursive function p (of two variables)below. It will be the case that, for all i and j, ϕp(i, j)(0) =p(i, j), and, whenever ϕp(i, j) is total, ϕp(i, j) ∈ C .

The class witnessing the theorem will be C′ = {ϕp(i, j):ϕp(i, j) is total}. Note that C′ ⊆ C .

It is easy to verify that C′ ∈ Fin-Refl. For any finite inputsequence σ ∈ SEQ, the learner outputs p(i, j), if i + j � |σ |and (0, p(i, j)) ∈ content(σ ); the learner outputs ‘?’ oninput σ if there exist no such i, j. Note that the abovelearner finitely learns exactly the class C′ .

Furthermore, given as input an infinite sequence σ suchthat, for some total f , content(σ ) = {(x, f (x)): x ∈ N}, onecan effectively determine in the limit (a) whether f isin C = C1 ∪ C2 and (b) whether f (0) = p(i, j), for somei, j. Clause (b) is easy to verify in the limit. To see (a),note that f /∈ C1 iff f (0) = 0, or for some x, y: x < yand Φ f (0)(x) � f (y). Thus, one can determine, in the limit,whether f ∈ C1. Similar reasoning can be used to showthat one can determine, in the limit, whether f ∈ C2.

Thus, in the limit one can determine if a function fbelongs to C′ . It follows that one can build the reflectionfunction for the learner.

We now define the function p as follows. By implicituse of the effective version of Kleene’s parameterized re-cursion theorem [13] there exists a recursive function psuch that ϕp(i, j) may be defined as follows. We may as-sume without loss of generality that p(i, j) > 0, for all i, j.Intuitively, we will use ϕp(i, j) to make sure that learnerM = ϕi (along with corresponding optimistic reflectionfunction R = ϕ j) does not Cons-oRefl-identify C′ .

Fix i, j. For ease of notation below, we will use M andR for ϕi and ϕ j , respectively.

Initially let ϕp(i, j)(0) = p(i, j). Let xs denote the leastx such that ϕp(i, j)(x) has not been defined before stage s.Thus x0 = 1.

Below ϕp(i, j)[x] denotes the initial segment(0,ϕp(i, j)(0)), (1,ϕp(i, j)(1)), . . . , (x − 1,ϕp(i, j)(x − 1)). Be-

S. Jain / Information Processing Letters 109 (2009) 208–211 211

low τ ranges over finite initial segments of canonical se-quences. Go to stage 0.

Stage s1. Dovetail steps 2 and 3, until, if ever, one of them suc-

ceeds. If ever step 2 succeeds before step 3, then go tostep 4. If ever step 3 succeeds before step 2, then goto step 5.

2. Search for an extension τ of ϕp(i, j)[xs], such thatR(τ ) = 0 and y > 0 for all (x, y) ∈ content(τ ).

3. Search for a number w > max {Φp(i, j)(x): x < xs}, suchthat

M(ϕp(i, j)[xs] � (xs, w)

)↓ �= M(ϕp(i, j)[xs]

)↓.

4. (a) Let ϕp(i, j)(y) = z, for (y, z) ∈ content(τ ).(b) Let ϕp(i, j)(y) = 0, for y = |τ | (i.e., for the least

number y, such that (y, z) /∈ content(τ ), for all z).(c) For y > |τ |, let ϕp(i, j)(y) = 1 + ∑

x<y Φp(i, j)(x).(* In this case, when step 2 succeeds, step 4 completes

the definition of ϕp(i, j) , and we do not need anyfurther stages. Also note that ϕp(i, j) is total in thiscase, as ϕp(i, j)(y) gets defined for all y < |τ |, bystep 4(a); for y = |τ |, by step 4(b); and for all y >

|τ | by step 4(c). *)5. Let ϕp(i, j)(xs) = w . Let xs+1 = xs + 1. Go to stage s + 1.

End Stage s

If there exist infinitely many stages, then clearly ϕp(i, j)is total. Also, due to success of step 3 and execution ofstep 5 in all stages, ϕp(i, j) ∈ C1 and M does not Ex-identify ϕp(i, j) (as M makes infinitely many mind changeson ϕp(i, j)).

If, in some stage s, step 2 succeeds, and thus step 4is executed, then clearly, ϕp(i, j) ∈ C2. Also since R is sup-posed to be optimistic, R(τ ) = 0 and content(τ ) is con-tained in ϕp(i, j) , we have that M (along with R) does notCons-oRefl identify ϕp(i, j) .

If, in some stage s, neither step 2 nor step 3 succeed,then ϕp(i, j) is not total. Furthermore, R is not a reflec-tion function for M . Otherwise M would need to iden-tify some extension of ϕp(i, j)[xs] � (xs, w), for all w >

max {Φp(i, j)(x): x < xs}, and thus M needs to be consistenton ϕp(i, j)[xs] � (xs, w), for all w > max {Φp(i, j)(x): x < xs}.This would imply M(ϕp(i, j)[xs] � (xs, w)) �= M(ϕp(i, j)[xs]),for all but one w > max {Φp(i, j)(x): x < xs}, leading to suc-cess of step 3.

Thus, we have that M (along with reflection function R)does not Cons-oRefl-identify C′ . Also note that, as claimedearlier, ϕp(i, j) ∈ C , whenever ϕp(i, j) is total.

The theorem follows. �As Fin ⊆ RCons ⊆ Cons (this holds also for each of the

reflection models) [16,6], we have the following corollary.

Corollary 7.

Fin-Refl − RCons-oRefl �= ∅.

RCons-Refl − Cons-oRefl �= ∅.

RCons-Refl − RCons-oRefl �= ∅.

The above solves the two mentioned open problems inthe paper [6].

Acknowledgements

We thank the referees for pointing out an inconsistencyin the usage of the notion of reflection in an earlier ver-sion of the paper. We also thank the referees for severalhelpful comments and reformulations which improved thepresentation of the paper.

References

[1] D. Angluin, C. Smith, Inductive inference: Theory and methods, Com-puting Surveys 15 (1983) 237–289.

[2] J. Barzdinš, Inductive inference of automata, functions and programs,in: Proceedings of the 20th International Congress of Mathemati-cians, Vancouver, 1974, pp. 455–560 (in Russian). English translationin American Mathematical Society Translations: Series 2 109 (1977)107–112.

[3] L. Blum, M. Blum, Toward a mathematical theory of inductive infer-ence, Information and Control 28 (1975) 125–155.

[4] M. Blum, A machine-independent theory of the complexity of recur-sive functions, Journal of the ACM 14 (1967) 322–336.

[5] E.M. Gold, Language identification in the limit, Information and Con-trol 10 (1967) 447–474.

[6] G. Grieser, Reflective inductive inference of recursive functions, The-oretical Computer Science A 397 (1–3) (2008) 57–69 (Special Issueon Forty Years of Inductive Inference. Dedicated to the 60th Birthdayof Rolf Wiehagen).

[7] K.P. Jantke, Reflecting and self-confident inductive inference ma-chines, in: K. Jantke, T. Shinohara, T. Zeugmann (Eds.), AlgorithmicLearning Theory: 6th International Workshop (ALT ’95), in: Lec-ture Notes in Artificial Intelligence, vol. 997, Springer-Verlag, 1995,pp. 282–297.

[8] K.P. Jantke, H.-R. Beick, Combining postulates of naturalness in in-ductive inference, Journal of Information Processing and Cybernetics(EIK) 17 (1981) 465–484.

[9] S. Jain, D. Osherson, J. Royer, A. Sharma, Systems that Learn: An In-troduction to Learning Theory, second ed., MIT Press, Cambridge, MA,1999.

[10] Y. Mukouchi, S. Arikawa, Inductive inference machines that can refutehypothesis spaces, in: K.P. Jantke, S. Kobayashi, E. Tomita, T. Yokomori(Eds.), Algorithmic Learning Theory: 4th International Workshop (ALT’93), in: Lecture Notes in Artificial Intelligence, vol. 744, Springer-Verlag, 1993, pp. 123–136.

[11] E. Minicozzi, Some natural properties of strong identification ininductive inference, Theoretical Computer Science 2 (1976) 345–360.

[12] H. Rogers, Theory of Recursive Functions and Effective Computability,McGraw-Hill, 1967. Reprinted by MIT Press in 1987.

[13] R. Soare, Recursively Enumerable Sets and Degrees, Springer-Verlag,1987.

[14] R. Wiehagen, Zur Theorie der Algorithmischen Erkennung, Disserta-tion B, Humboldt University of Berlin, 1978.

[15] R. Wiehagen, W. Liepe, Charakteristische Eigenschaften vonerkennbaren Klassen rekursiver Funktionen, Journal of Informa-tion Processing and Cybernetics (EIK) 12 (1976) 421–438.

[16] R. Wiehagen, T. Zeugmann, Learning and consistency, in: K.P. Jantke,S. Lange (Eds.), Algorithmic Learning for Knowledge-Based Systems,(GOSLER) Final Report, in: Lecture Notes in Artificial Intelligence,vol. 961, Springer-Verlag, 1995, pp. 1–24.

[17] T. Zeugmann, S. Zilles, Learning recursive functions: A survey, Theo-retical Computer Science A 397 (1–3) (2008) 4–56 (Special Issue onForty Years of Inductive Inference. Dedicated to the 60th Birthday ofRolf Wiehagen).