The Computer in Determining Stemmatic Relationships

The Computer in Determining Stemmatic RelationshipsAuthor(s): Eric PooleSource: Computers and the Humanities, Vol. 8, No. 4 (Jul., 1974), pp. 207-216Published by: SpringerStable URL: http://www.jstor.org/stable/30199683 .

Accessed: 19/10/2014 00:50

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Springer is collaborating with JSTOR to digitize, preserve and extend access to Computers and theHumanities.

http://www.jstor.org

This content downloaded from 173.161.12.13 on Sun, 19 Oct 2014 00:50:01 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=springer

http://www.jstor.org/stable/30199683?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


Computers and the Humanities, Vol. 8, pp. 207-216. PERGAMON PRESS, 1974. Printed in the U.S.A.

The Computer in Determining Stemmatic Relationships

ERIC POOLE

As a lawyer, I became interested in stemmatic relationships when I observed that legal documents, when their wording was similar, could be shown to have a relationship which could be represented by a tree, or stemma, much like manuscript texts. The reason was obvious: a draftsman preparing a formal document will practically never make it up entirely out of his own head. Instead, he will find a text which can be used as a precedent and will copy this with such modifications as he considers necessary to adapt it to the business in hand. Such a precedent may consist of an existing document, or it may consist of a standard form expressly prepared for use as a precedent. Both techniques have probably been in use almost as long as there have been formal documents of any kind, but the preparation and circulation of standard forms, especially in collections of precedents for various purposes, was greatly facilitated by the invention of printing. Where, for a given set of documents, the technique was that of copying an existing document, the stemma will have a characteristic branching form; but if a standard form was used, the stemma will be compact, with all the texts deriving from a single original. In practice, of course, intermediate forms of stemma are probable, because although a standard form may have been in use, some of the documents may not have been derived directly from it, being copied instead from other documents which were themselves derived from the standard form.

In legal research, determining the wording of a precedent may be important in several ways. For example, the historical significance of the words in a document may be very much greater if they were deliberately included, and were not simply taken from the precedent, or the form of the stemma may

give us information about the system by which the documents were produced.

There seem to be two paramount difficulties to be overcome in the determination of a stemma. The first is inherent in the very nature of a stemma: any point in it (whether or not at one of the nodes) is capable of being treated as the origin of all other points, without logical inconsistency with the data. The direction of the stemma is not therefore some- thing which can be ascertained by a computer; it is a problem for the judgment of an editor, based on such matters as the dates of documents, or his opinion whether variants could, or could not, have come from the archetypal text.

The second difficulty is that of contamination and coincident variation. This, though not inherent in the nature of a stemma, seems in practice to be unavoidable if the data are of any serious complexity. In some cases there will have been deliberate contamination, as where a draftsman has taken some of his wording, not from his main precedent, but from some other document derived from it. In extreme cases, he may even have prepared an eclec- tic form from a number of documents, to use as a standard form in future. Even if there has not been deliberate copying of this kind, however, there is likely to be a similar effect from coincident variation. Similar causes produce similar effects, whether by way of intentional modifications of the wording or by common errors in transcription.

The statistical probability of coincident variation in manuscript texts also has often been underesti- mated. Even if there was a purely random distribution of variations in a text, there would be a high probability of coincidences, much as there is a high probability that in a class of thirty students at least two will have the same birthday. In fact, of course,

Eric Poole, a senior lecturer at the University of Kent, Canterbury, teaches English property law and in 1973 published a book about it.

207



208 ERIC POOLE

the distribution will not be random, because there will be factors encouraging the same variation to occur at the same place in the text.

Essentially the same principles, and the same problems, apply both to anything which is dissemi- nated by a copying process in which there is any element of error or discrepancy and to the determination of the relationships of legal documents or of manuscript copies of literary texts. For instance, these principles may be applied in biology, in that any evolution of organisms is in some sense a copying process. Since, however, most of the think- ing on the subject has been done by textual critics of classical and medieval manuscript texts, I shall discuss the problem primarily in terms of textual criticism.

It has been argued by some modern textual critics that coincident variation and contamination are so extensive and inevitable that the formation of a stemma is impossible, and any attempt to do so is a waste of time. For some material, this may in practice be true. It is also possible that stemmatic investigation is not particularly important in textual criticism, even though this has been its traditional field. What the textual critic presumably wants to do, in the end, is to establish a text which is, as nearly as possible, that of the original. Working out a stemma is a step toward this, but where the tradition is confused, the advantages of having a stemma may not have justified the labor of working it out, at least by traditional methods.

We must distinguish between the existence of a stemmatic relationship, and the practical difficulty of discovering it. Any set of things which have been produced by copying must necessarily have a stemmatic relationship. It may be an exceedingly complicated one; scribes may even have copied part of their text from one source, and part from another, so that the stemmatic relationship would be different from the two parts. But even this would not mean that a stemma did not exist, or that it was in principle undiscoverable. The mere fact that a job is difficult is no ground for saying that it is impossible, or not worth doing.

As I see it, any such attempt must be based, in some way, upon distinguishing the "signal" of the true stemma, from the "noise" of the contaminations and coincident variations, in reliance upon the principle that truth is single and consistent while error is multiple. In other words, we need a method by which the data which are inconsistent cancel each other out, leaving the true relationship to stand revealed. It is here that we have most to hope

from the computer, because the distinguishing of the "signal" from the "noise" will be possible only by the accurate comparison of large quantities of individually trivial data.

Probably the fullest account of a procedure for ascertaining a stemma by a rigorous procedure capable of being performed by a computer is that described by Dom J. Froger.' His method is based on set theory. For each reading, he defines the set of manuscripts which share the variant in question, and then he compares each set in turn with every other, to find whether one set always includes the other. Having done this, he sorts the sets into a hierarchy, according to their level of inclusiveness, and from this he constructs his stemma. This method, as he describes it, depends upon having a good supply of readings for which there are only two variants. In practice, it is hard to find enough of these where any considerable number of texts is in question. It is possible, however, that his method could be adapted to cope with complex sets with three or more variants.

Obviously, a method such as this will produce a stemma satisfactorily only if there are no contaminations or coincident variations. Following Froger's example, I shall refer to these henceforth as "anomalies" for the sake of brevity. His solution to the problem is that, when the sets have been sorted into a hierarchy, he constructs a graph in which the sets are plotted according to their level of inclusiveness, and each is connected by a line to the group which includes it. Where there is an anomaly, a set will be included in more than one higher set, and so the lines will form closed figures incompatible with a stemma. To decide which of the higher sets is to be treated as genuinely including the lower set, he either selects the relationship which occurs most frequently, or else makes a subjective judgment based on the wording of the text. The method described by P. Maas2 appears to be based on similar assumptions.

A number of other procedures have been pro- posed for overcoming this difficulty and establishing objectively what are the relationships between texts, or other assemblies of data, despite the presence of anomalies. John G. Griffith, of Oxford, approaches the problem as one of classification in the first instance, and sets out to group arrays of MSS into clusters by a statistical process of tabulating their shared variants. He thinks it may be possible to deduce a tree-structure in favorable cases, but that where serious contamination has effaced the stemmatic relationships it may be necessary for editors



DETERMINING STEMMATIC RELATIONSHIPS 209

to do the best they can with the cluster analysis.3 Another method, based on comparing each pair of sets of data and calculating their degree of similarity, is that of M. J. Sackin, of Leicester, working on biological material.4 Peter Buneman, of Edinburgh, discusses a method which seems to be the converse of this: the construction of a stemma from the "dissimilarity coefficient" of each pair of texts.s Yet another approach has been suggested by Lam- bertus Okken, of Utrecht. This consists of enumer- ating the sets of texts with shared variants, and the compiling of a table which shows in graphic form the frequency and distribution of these texts.6

Any mechanical procedure for stemmatic analysis is subject to the limitations which I mentioned earlier; but when all due allowance has been made for these, there remain large possibilities for the use of a computer in this work. To be of significant usefulness, however, any such procedure needs to satisfy at least the following requirements:

(a) It should be able to handle an input which represents all the significant variations of all the texts under examination.

(b) Its output should include a stemma showing a feasible relationship between those texts, and the readings which it ascribes to any lost or hypothetical texts required by the stemma.

(c) It should be able to perform this as a single operation, without human intervention at any stage.

(d) It should embody a rational and consistent method for identifying, and allowing for, anomalies; that is, for present purposes, readings inconsistent with the identification of a stemma.

(e) It should be able to handle readings with any number of variants.

The procedure which I have used, in an attempt to meet these requirements, has taken the form of an Algol 60 program, run on the ICL 4130 computer of the University of Kent at Canterbury. As Algol is a language best suited for handling numerical material, the input is in the form of a numerical code. Each reading (that is, each place at which there is a variation between the versions in any two or more of the texts) is given a code number, which is represented for the purposes of the program by a line in the array in which the data are stored. The columns in this array represent the texts, and a code number is allocated to the variant for each text. One text is treated as the base text, and the number 1 is allocated to the variant found in this, even if it takes the form of the omission of some- thing found in another text.

The method can perhaps be best explained by an example. Suppose that we have six documents, which contain respectively the following words:

1. Stout Cortez, when with eagle eyes 2. Stout Cortez, when with evil eyes 3. Proud Cortez, when with eagle eyes 4. Bold Cortez, when with evil eyes 5. Proud Cortez, when with lethal eyes 6. Bold Cortez, when with lethal eyes Let us also take No. I as the base text. Then, for

the first reading, "Stout" would be given the variant number 1, "Proud" 2, and "Bold" 3. Similarly, for the second reading, "eagle" would be given 1, "evil" 2, and "lethal" 3. We should thus have the following tabulation in the array:

Document: 1 2 3 4 5 6

Reading 1: 1 1 2 3 2 3

Reading 2: 1 2 1 2 3 3

These six pairs of variants can be rearranged in an endless linked sequence, namely:

1 3-3 2-2 1 - 1

I I I I I I etc. 2- 2 3- 3 1 -1 2 etc.

Wherever this happens, it indicates that there is no possible stemma into which both readings can be fitted; for a stemma is by its nature an open system, and cannot give rise to a closed figure of this sort.' We may be certain, then, that one at least of the readings contains an anomalous variant, though we cannot from the available data decide which it is.

To detect such cases, the program derives from the data a matrix, in which the coordinates for each pair of variants are indicated by the number 1, while all other spaces in the matrix remain at zero. For the data which I have just given, the matrix would be:

Variants of Reading 2:

(1) (2) (3)

(1) 1 1 0

(2) 1 0 1

(3) 0 1 1

Variants of Reading 1:



210 ERIC POOLE

In this matrix, in this instance, there is a configura- tion such that, if a line were drawn connecting the coordinates of the pairs of variants, and always turning at right angles each time it reached a space occupied by a 1, it would come back to its beginning and form a closed figure. In this way, the program detects the presence of an anomaly.

As I have said, we cannot decide, from a single pair of readings, which contains the anomaly. If we have a large enough number of readings for comparison, however, we may be able to identify the anomalies mechanically. If a reading which does not itself contain an anomaly is compared with every other reading in turn, the number of times that an anomaly is disclosed cannot exceed the number of anomalous readings. If, however, the reading under examination contains an anomaly itself, this limita- tion will not apply, because any comparison may disclose the anomaly. The program therefore compares each reading with every other reading in this way, keeping a count of the number of times that the comparison indicates the presence of an anomaly, and then sorts the results in descending order. We may take it that a reading must be anomalous if its number of indications of anomaly exceeds its serial number in the list. Suppose, for instance, that the upper part of the table was as follows:

Serial Number Reference Number Anomaly Count

1 68 9 2 94 8 3 3 7 4 54 5 5 95 4

Nine or more anomalous readings might be theoretically possible in the text as a whole, so that even Reading 68 could be free from anomalies and yet produce nine indications of the presence of an anomaly. Such a possibility seems, however, to be exceedingly remote, for it would require the anomalies to be so distributed that they never disclosed their presence when the anomalous readings were compared with one another.

Consequently, Readings 68, 94, 3, and 54 cannot be accepted as free from anomalies. For instance, if Reading 54 is free, then there must be at least five anomalous readings; but in this case, Reading 54 must be among them, for it is only the fourth in

serial order. The assumption that Reading 95 is free is, however, not open to an inconsistency of this sort, for it requires the existence of only four anomalous readings, and these could very well be Readings 68, 94, 3, and 54. The program therefore rejects these four as possibly anomalous and not to be relied upon in forming the stemma, and the program accordingly has an array in which each such reading is noted.

We are not, however, justified in assuming that there is no other anomalous reading which has escaped this procedure, and it is therefore repeated with the remaining readings until all indications of anomaly have been accounted for. Before this stage, a point may be reached at which two or more readings each produce one indication of anomaly. In that case, the program takes the only safe course, and rejects them all. Similarly, if at any stage it finds a reading which must be treated as anomalous because its anomaly count is greater than its serial number, it rejects not only that reading, but all others having the same anomaly count, because their serial number will depend upon the order in which the computer came to them among the data, and not upon any intrinsic difference in the data themselves.

In this procedure, there is no need to examine any reading for which only one variant has more than one representative. Such a reading would be consistent with any stemma whatever, and is therefore useless as evidence of anomaly, however important it may be for other purposes. I shall refer to such readings as "idiosyncratic readings."8 To avoid the waste of running time, the program notes them before carrying out the anomaly test, and they are then left out of account.

The program has now identified and eliminated all the readings which are inconsistent with, or irrelevant to, the formation of a stemma. This makes possible a procedure by which the stemma can be constructed even if there are multiple variations for some of the readings which still have to be taken into account. The variants for any reading, for any pair of texts which are derived from a common source without any intermediate source which is also among the documents under examination, will either be identical, or else not both be found among the other documents. The next stage in the program therefore takes each possible pair of documents and counts the number of instances in which their variants differ and are both found elsewhere. If the result is zero, it is evidence (though




not conclusive evidence) that the documents share an immediate common source.

The evidence cannot be conclusive, because it is possible to have a situation in which, say, the count as between Document 1 and Document 2 is zero, and so is the count between Documents 2 and 3, but not the count between documents 1 and 3. In such a case, the program treats the relationships between all three documents as unproved for the time being. Our only criterion for classifying any two things as identical is, in the last resort, the lack of observable differences, which may be simply due to too few data. The more data that we have, the better our classifications will be, but we are never entitled to suppose them absolutely valid.

It is also to be noted that the number of dis- similarities would be zero if one of the documents was a copy of the other. At the present stage no account needs to be taken of this possibility, because there needs to be no distinction between a document itself, and another document which is a perfect copy of it; and it is very probable that a document may appear to be a perfect copy of its source, because of a mere lack of data to show that it is not.

It seems to have been overlooked, or too lightly regarded, by some writers on textual criticism, that the concepts of identity or similarity are essentially negative. So, for instance, Maas begins his procedure with the elimination of derivative texts (eliminatio codicum descriptorum),9 identifying these by the criterion: "If the witness, J, exhibits all the errors of another surviving witness, F, and in addition at least one error of its own . . . , then J must be assumed to derive from F." In this context, in any objective sense, "error" can only be taken to mean "distinctive variant"; and to decide what are the distinctive variants of one text requires a comparison of all texts. To reject a text as "derivative" before this comparison has been carried through amounts, therefore, to the assumption of one of the things which have to be proved. There may, it is true, be practical justifications for doing this, on grounds other than comparison of all the texts: for instance, the express evidence of a scribal colophon identifying the text which was copied or a group of texts like those of Epiphanius which were found by K. Holl to have blank spaces exactly corresponding to missing leaves in another text which was obviously their exemplar.1 For legal texts of the kind in which I am interested, it is of course of the first importance that direct relationships should be

established rigorously by a study of the whole of the evidence.

A further advantage of postponing a decision, whether one text is copied from another is that almost all the possible stemmatic relationships which are categorized by writers such as Maas may be left out of account for the present purpose, because they assume knowledge that one text is derived from another. The only relationship with which we need to concern ourselves here is that of a group of two or more documents which are directly copied from a common source which is to be treated as no longer extant.

Having postulated this lost common source, the program allocates a reference number to it, and tries to determine the variant which it would have contained for each of the readings. Having done this, it repeats the whole procedure up to this point, except that this time it takes into account, not the whole body of extant documents, but only the postulated lost source documents together with those extant documents for which a source has not yet been postulated. In this way, it proceeds with the assembly of the materials for constructing the whole stemma.

It must of course give a code number to the variant for every postulated source, for every reading. In most cases this presents no difficulty, because either the variants for the derivative documents are identical, or one of them (and not more than one) is confirmed by being found elsewhere among the relevant data. If there are more than two documents in the group, the program also accepts a variant which is found in two or more of them, provided that not more than one of any other variant is found among them. None of these methods will help, however, if the variants are all idiosyncratic, or if there is an anomaly so that more than one variant is found elsewhere. In such a case, the program puts in the reference number of the postulated source. As this will be unique to that document, it has the same effect at all subsequent stages as an idiosyncratic variant.

One effect of this procedure is to eliminate from the set of documents under active consideration a certain number of anomalous variants. It is desirable that advantage be taken of this reduction to supple- ment the data available for subsequent stages of the investigation; for another effect of the reconstruction will be to exhaust the usefulness of some of the readings which were free from anomalies, so that there needs to be an economical use of data.



212 ERIC POOLE

The program therefore repeats the anomaly check at each stage for the group of original or postulated documents then under consideration. When the number of these documents falls below four, any further analysis on the lines which I have been describing becomes impracticable. The procedure is therefore wound up by postulating a single direct source for the surviving documents, giving it a reference number, and reconstructing its variants as for any other postulated source.

The materials are now available for constructing a stemma. This could, if necessary, be done by programming the computer actually to draw the stemma, but I decided that it would be better to have it represented algorithmically, in such a way that the columns of variants may then be arranged in tabilar form under the reference numbers of the documents in their stemmatic sequence. (An algo- rithm of this type is given by Korfhage.' ') Lastly, a calculation is made of the number of readings for which each document certainly differs from the variant in its postulated source. This provides material for the decision whether one document is itself the source of another, and perhaps for the applica- tion of statistical methods to improve the quality of the results.

I cannot claim that stemma produced by the procedure described will exactly represent the truth, for I do not believe that any piece of research can ever do more than produce a result which, if not certainly true, is at least in accordance with the data and is capable of further testing. It does seem, however, that the procedure will give a stemma which will provide a basis for investigation, in the light of information such as chronological or paleo- graphic material, with much less effort and with higher reliability than could be attained without a computer.

To test the program experimentally, it was first intended to run it with artificially constructed sets of data, incorporating deliberate contaminations. It soon became apparent, however, that these could hardly match the complexity of a real manuscript tradition, or put the program to a sufficiently rigorous test. I have therefore tested it against a passage from Professor George Kane's edition of the A-text of Piers Plowman12 consisting of lines 105 to 158 of Passus V. This was chosen because the text has been edited with exemplary thoroughness and clar- ity, with a textual apparatus giving all the data which were required; the passage in question is one of the few to be found in all seventeen of the manuscripts used in the edition; and the textual

problems are of great complexity. Indeed, the editor is himself one of those scholars who doubt the utility of trying to establish a stemmatic relationship between texts, saying that

the genetic history of the A manuscripts is not recoverable in any form useful for the editorial process of recension; [and] the combination of the two types of convergent variation, correction or conflation on the one side, and coincident variation on the other, so obscures the relation of these manuscripts as to make general, clear and unqualified deductions about their genetic relation impossible.1"

The texts are indeed so corrupt that in the passage of 54 lines which was examined there are 400 readings for which variants are recorded in the apparatus. Trying to construct a stemma by using readings with only two variants, but without draw- ing on an enormous quantity of available data, is hopeless for there are only 90 such readings among the 400; of these, all but 20 are idiosyncratic; and of the 20, only 5 gave sets (no two of which were identical) which were consistent with each other so as to provide material for a stemma.

Equal weight was given to every kind of variation, however trifling. There seem to be no valid criteria for deciding which kinds truly represent the main tradition. If anything, the minor variations are more likely to do so, in that they have the better chance of escaping the attention of persons amend- ing or modifying the text. It is true that many variants obviously coincided because of independent scribal preferences, or common copying errors, but it seemed wiser to refrain from prejudging this. If two or more texts had an omission at the same reading, it was treated as a shared variant as against the other texts; but if, between these, there were differences, they were treated as a separate reading, numbers being allocated to the texts with the omission as if they contained idiosyncratic variants. In this way, the omission was allowed to count only once as a significant feature.

In allocating code numbers to the variants, the editor's classification was followed exactly, since any other procedure might have introduced a subjective element into my treatment of the data. If two variants were shown by the editor as differing in any way, even in a trivial point of spelling, they were given different numbers, and were therefore treated by the computer in just the same way as if they had not resembled each other at all.




The seventeen manuscripts whose variants were given in the textual apparatus are as follows:

A Bodleian, Ashmole 1468. Ch Liverpool University Library F.4.8. (Chader-

ton MS) D Bodleian, Douce 323. E Trinity College, Dublin D.4.12. H British Museum, Harley 875. H2 British Museum, Harley 6041. H3 British Museum, Harley 3954. J Pierpont Morgan Library, New York M 818.

(Ingilby MS) K Bodleian, Digby 145. L Lincoln's Inn, MS 150. M Society of Antiquaries, London, MS 687. N National Library of Wales, MS 733B. R Bodleian, Rawlinson Poetry 137. T Trinity College, Cambridge MS R.3.14. U University College, Oxford MS 45. V Bodleian, English Poetry a.1. (Vernon MS) W Duke of Westminster's MS, Eaton Hall.

In any experiment based on genuine material, there is of course no possibility of comparison of the results with any archetypal text, for none is extant. The only practicable verification must therefore be by comparing the results from different bodies of data, to find whether they are consistent. For this purpose, the readings were divided into four groups of 100, representing respectively:

A Lines 105 to 117 B Lines 117 to 129 C Lines 129 to 143 D Lines 143 to 158

The results of running the groups separately showed a considerable degree of consistency, but it became plain that, because of the high proportion of idiosyncratic and anomalous readings, the samples from which a stemma could be constructed were too small. The program was therefore next run with the groups in pairs (AB, AC, AD, BC, BD, and CD) and then in threes (ABC, ABD, ACD, and BCD). This increased the size of the samples and the possibility of a spurious consistency was offset, firstly, by the fact that each of the pairs would be set against another pair with entirely different data, and secondly by the nature of the program itself, because the placing together of two or more groups would, if they were not as a whole consistent with

each other, result in the rejection of their data by the anomaly test.

Five groupings of manuscripts were found to recur with great regularity:

(a) Manuscripts T, Ch, H2. In runs AB, ABC, BCD (and when C was run on its own) these were grouped directly as

T

Ch

H2

and in runs AC, BC, BD, ACD as

HCh T

H2

For AD and ABD, the grouping was the same as for AC, BC, and ACD, except that, for AD, manuscript D was associated with manuscript Ch, while, for ABD, manuscript J was associated with it as well. In run CD, the grouping was indeterminate.

(b) Manuscripts R, U. These were directly associated in every run.

(c) Manuscripts V, H. These were directly associated in every run, except that, in run AD, manuscript H was associated with manuscript M, and the whole group was associated with manuscript L; and in run ACD, manuscripts V and H were jointly associated with manuscript E.

(d) Manuscripts A, H3. These were directly associated in every run, except that, in run BC, manuscript J was associated with manuscript A, while in run BD it was associated with their common source.

(e) Manuscripts W, N. These were directly associated in every run, although in run CD their common source was associated with manuscript L.

All these groupings had been reported, to some extent, by the runs with single groups. Run A, in fact, had them all, though it gave manuscripts T, Ch, H2 as derived directly from the same source. Run B had only Groups (b) and (c). Run C had



214 ERIC POOLE

Groups (a), (b), and (c), and Group (a) arranged as for Runs AB, ABC, BCD. Run D reported only Group (e).

Since methods such as this are credible only if they produce sensible results, I give as a practical check the version for each group of the four lines 107 to 110, reconstructed from the apparatus in the printed text:'4

Group (a) Manuscript Ch

Thanne com coueitise; I can hym nougt discryue, So hungirly & holowe sire heuy he loked. He was betilbrowed & babirlippid bothe With two blered eiyen, As a letheren purs lolled his chekis.

Manuscript T Thanne com coueitise; I can hym nougt descryue, So hungirly & holewe sire heruy hym lokide. He was bittirbrowid & babirlippid bothe With two bleride eiyen, As a litherene purs lollide his chekis.

Manuscript H2 Thanne com coueitise; I can hym nougt discryue, So hungirly & holewe sire heruy hym lokide. He was baburlippud & biturbrowed bothe With two blerid eiyen, As a letherne purs lollid his chekis.

Group (b) Manuscript R

Thanne com coueitise; I can hym not discrye, So angrey and so holwe sire heruy hym loked. He was byterbrowed and eke baberlypped With two blered eyen, as a lether purs Lolled his chekis.

Manuscript U Thanne cam coueitise; I can nougt descrye, So hungry and so holwe sire heruy hym lokide. He was babirlippid and eke biterbrowed With two blerid eiyen, as a lethern purs.

Group (c) Manuscript V

Thenne com coueitise; I couthe hym not discreue, So hungry and so holewe sire heruy hym loked. He was bitelbrouwed with twei blered eiyen, And lik a letherne purs lullede his chekis.

Manuscript H Then com coueitise; I can hym not discryue,

So hungry & holwe sire heruy hym loked. He was bitelbrowid & babirlippid with two

brode iyen, And as a letherne purs lollid his chekis.

Group (d) Manuscript A

Thanne cam sone coueitise; I can hym not discrye, So hungry he lokid so heri & so ille. He was bittilbrowid and eke babirlippid with to

blereyd eyn as a blynd hagge, And a lederend purs so lokyd his chekis.

Manuscript H3 Thanne com concyens; I can hym nougt dyscryye, So hungry he lokede syre heruy & holwe. He was bittilbrowid and eke babirlippid with two

blere eyid eyne blynd as an hagge, And as a letherene purs lokede his chekis.

Group (e) Manuscript W

Thanne come sire coueitise; I can hym not discryue, So hongrely and so holgh sire henri he loketh. Bitilbrowed & baberlypped with two blered eyen, And as a letheren purs honged his chekis.

Manuscript N Thanne com coueitise; I can hym nougt descreue, So hungryly and so holowe sire henri thanne loked. Bitilbrowed & baberlippid with two blered eyn, And as a letherne purs lolled his chekis.

We can observe from a comparison of these versions that omissions, interpolations, and variations in line division are highly characteristic; and that although spelling is not usually characteristic to a group, it cannot be entirely disregarded. The consonants seem to be a more stable basis for comparison than the vowels, as can be seen from a comparison of the weak preterite endings in these eleven manuscripts for the whole of the passage under examination. These made quite a good sample, as there were 17 of them. It was apparent that the choice of ending was almost entirely a matter of scribal preference, or of dialect, rather than of copying, though it was perhaps significant that, if a scribe departed from his usual practice, he often did so in a way which agreed with one of the other texts. Even minor variants such as these, therefore, may be of use in establishing the stemma, and are better not left out of account when the data are assembled at the beginning. On the whole, the groupings found by the computer vindicate the




suggestions of Thomas A. Knott and David C. Fowleri s which were rejected by Professor Kane. It would, of course, hardly be safe to say, on the basis of such a small sample as 54 lines of text, that this judgment was incorrect.

We are left with six manuscripts (D, J, L, E, K, M) unaccounted for, while the relationship between the sources of the five ascertained groups is also not clear. The computer output does in fact give some ground for conjecture that groups (a) and (b) are related to manuscript D in the form:

(b)

(aD (a)

Beyond this, the results are insufficiently consistent to support even a plausible conjecture, and obviously a larger sample needs to be used in any serious attempt to elucidate the relationships of all the extant texts.

The trial with the Piers Plowman material shows, however, that some objective results can be obtained even with a comparatively small sample of grossly corrupt texts. I am convinced that neither the procedure which I have described, nor any other purely mechanical procedure, can ever completely reconstruct the stemmatic relationship of a group of manuscript texts, or documents, or organisms, if only because the position of the archetype must be determined subjectively. Within this restriction, however, it seems reasonable to hope that such a procedure can, if used intelligently, provide reliable materials for the reconstruction of the stemma. Even in cases where it falls short, it may provide partial stemmata which will throw light on such matters as scribal practice and linguistic ques- tions.' 6

NOTES

1. Dom J. Froger, La critique des textes et son auto- matisation (1968).

2. P. Maas, Textual Criticism (1927); English translation by Barbara Flower (1958).

3. John G. Griffith, "A taxonomic study of the manuscript tradition of Juvenal," Museum Helveticum 25 (1968), 101-138; "Numerical taxonomy and some primary MSS of the Gospels," Journal of Theological Studies 20 (1969), 389-406.

4. M. J. Sackin, "Crossassociation: a method of comparing protein sequences," Biochemical Genetics 5 (1971), 287.

5. O. P. Buneman, "Filiation of Manuscripts," in Mathematics in the Archaeological and Historical Sciences (1971).

6. L. Okken, "Ein Beitrag zur Entwirrung einer kon- taminierten Manuskripttradition" (1970) (a disserta- tion submitted to the University of Utrecht), discussed by Dr. H. H. R. Love, "The computer and literary editing: achievements and prospects," in The Com- puter in Literary and Linguistic Research, ed. R. A. Wisbey, 1971, p. 52. I am grateful to Dr. Love for sending me a copy of the relevant passage from this thesis.

Since writing this article, I have read "Algorithms, stemmata codicum and the theories of Dom H. Quen- tin," by G. P. Zarri of Milan, in the collection The Computer and Literary Studies ed. A. J. Aitken, R. W. Bailey and N. Hamilton-Smith (Edinburgh, 1973). The writer discusses a project for reducing to a strict algorithmic form the procedures suggested by Dom H. Quentin in his Essais de critique textuelle (1926), and thus rendering them capable of being applied by a computer. The essential feature seems to be the taking of the texts in groups, and finding whether they can be fitted into small stemmata which can then be coordinated to form the stemma for all the texts. The practical result seems to be similar to that of the method which I have employed for forming a stemma, except that there is not a preliminary procedure for eliminating anomalies. It seems, therefore, from the examples given in the article, that the method tends to produce structures with cross-links, forming closed patterns which are not true stemmata and cannot, without modification based on subjective criteria, possibly represent the true history of a text. The method also seems to have difficulty in coping with hypothetical lost texts.

However this may be, the article is outstanding for the good practical common sense which it brings to bear on the problems of stemmatic analysis as a whole, and in particular for its rejection of mathe- matical theory not based on empirical observation.

7. I am indebted to Dr. Buneman for explaining to me some of the properties of these figures, in particular for pointing out to me that any closed figure will indicate a state of affairs inconsistent with a tree structure, and not only a rectangular figure as I had previously supposed. He has also shown, in his paper "A characterisation of rigid circuit diagrams" (of which he kindly supplied me with a copy of the draft), that it is possible for sets of more than two readings to disclose an anomaly, though no two taken together would have disclosed it on their own. A search for instances of this would, however, require a formidable increase in computer time, with little prospect of materially improving the efficiency of the procedure in other respects, and I have therefore confined the procedure in my program to a search for closed configurations in pairs of readings.

8. In the English translation of Maas' Textual Criticism, these are called "peculiar readings" or "lectiones singulares" (par. 8).

9. Textual criticism, par. 4. 10. G. Pasquali, Storia della tradizione e critica del testo

(1952; 2nd ed. 1962), p. 33. 11. Robert R. Korfhage, Logic and algorithms (1966),

Chapter 7. 12. George Kane, Piers Plowman: the "A" version (1960). 13. Ibid, pp. 112-113. 14. For typographical convenience, I have replaced the

character "thorn" by "th," and the character "yogh" by "g" or "y" according to the letter which represents it in modern spelling.

15. Thomas A. Knott and David C. Fowler, Piers the Plowman: A critical text of the "A" version (1952). They directly associate the smembers of groups (a), (b), (c), and (e), while for the members of group (d) they



216 ERIC POOLE

associate manuscripts A and H3, but bring in manuscripts E and M as well; this last arrangement is criticized by Professor Kane on p. 84 of his edition. In group (a), Knott and Fowler give the same internal grouping as that which I obtained from runs AC, BC, BD, ACD. They also associate groups (a) and (b) with manuscript D in a similar way to that suggested by my results.

16. As well as my thanks already included in these notes, I want to thank Dr. Griffith, DIr. Sackin, and the Rev. A. Q. Morton for helpful conversations and the provision of copies of their work; David Shaw of the French Department of the University of Kent for many useful comments and criticisms; and Eveline Wilson of our computer unit for advice on the writing of the computer program.



The Computer in Determining Stemmatic Relationships

Documents

Transcript of The Computer in Determining Stemmatic Relationships