My Masters Thesis
-
Upload
micheal-axelsen -
Category
Documents
-
view
156 -
download
1
description
Transcript of My Masters Thesis
The University of Queensland
Faculty of Business, Economics & Law
Department of Commerce
Information Request Ambiguity and End User Query
Performance: Theory and Empirical Evidence
A Thesis submitted to the Department of Commerce, the University of Queensland, in partial fulfilment of the requirements for the degree of
Master of Information Systems.
By Micheal Axelsen
15th June 2000
Supervisor: Dr Paul Bowen
i
Acknowledgments
I wish to express my appreciation and thanks to my supervisor, Dr Paul Bowen, for his
assistance, advice, and patience in the preparation of this thesis. To my mother I offer thanks
for making it all possible. I also express sincere gratitude to my wife, Leeanne Klan, whose
obstinate patience continues to assist in putting the world in focus.
I also thank workshop participants at Nanyang Technological University in Singapore for
their comments and contributions to this thesis.
ii
Abstract
The increasing reliance of organisations on information technology and the persistent
shortage of IT/IS professionals requires end users to satisfy many information requests by
querying complex information systems. Because many business decisions are now based on
the results of the end users' queries, information request ambiguity has extensive
ramifications for business practices. Where the queries do not match the requirements of the
information requests, the business decisions are likely to be fundamentally flawed.
This paper develops a theory of ambiguity in information requests and reports the results of
an initial empirical investigation of that theory. The theory identifies seven ambiguities:
lexical, syntactical, inflective, pragmatic, extraneous, emphatic, and suggestive. A laboratory
experiment with sixty-six participants was used to investigate the empirical effect of
ambiguity on end user query performance. End user query performance was measured by the
number of total errors in the proposed solution, the time taken to complete the solution, and
the end user's confidence in the solution.
The results indicate that ambiguity significantly degrades end user query performance. The
seven types of ambiguity were analysed to determine their individual effects on end user
query performance. Actual (pragmatic, extraneous) and imaginary (emphatic, suggestive)
ambiguities show significant relationships with total errors and duration. In general, potential
(lexical, syntactical, and inflective) ambiguities were not significantly associated with total
errors or end user confidence. The results should have important implications for consulting
firms, for organisations with ad hoc work groups, and for entities that make extensive use of
electronic mail for information requests.
iii
Table of Contents
1. Introduction............................................................................................................................. 1
2. Information Request Ambiguity and End User Query Performance ..................................... 3
2.1 .A Theoretical Model of Information Request Ambiguity .......................................................... 3
2.2 .The Nature of Ambiguity ......................................................................................................... 5
2.2.1 Potential Ambiguity ..................................................................................................... 7 Lexical Ambiguity ........................................................................................................... 7 Syntactical Ambiguity ..................................................................................................... 8 Inflective Ambiguity ........................................................................................................ 9
2.2.2 Actual Ambiguity ...................................................................................................... 10 Pragmatic Ambiguity ..................................................................................................... 11 Extraneous Ambiguity ................................................................................................... 12
2.2.3 Imaginary Ambiguity ................................................................................................. 14 Emphatic Ambiguity...................................................................................................... 14 Suggestive Ambiguity.................................................................................................... 15
2.2.4 Ambiguity in Practice ................................................................................................ 17
2.3 .Task Complexity ................................................................................................................... 18
2.4 .Theoretical Model Summary .................................................................................................. 19
3. Methodology .......................................................................................................................... 20
3.1 .Experimental Design ............................................................................................................. 20
3.2 .Experiment Participants ......................................................................................................... 21
3.3 .Assessment of Participant Responses ..................................................................................... 21
4. Results and Discussion .......................................................................................................... 23
4.1 .Overview of Experimental Results ......................................................................................... 23
4.2 .Regression Analysis............................................................................................................... 27
4.3 .Ambiguity Treatment Multiple Linear Regression Model Results ........................................... 29 4.4 .Multiple Linear Regression Model: Seven Types of Ambiguity .............................................. 31
4.5 .Summary of Results ............................................................................................................... 32
4.5.1 Potential Ambiguity ................................................................................................... 34
4.5.2 Actual Ambiguity ...................................................................................................... 35
4.5.3 Imaginary Ambiguity ................................................................................................. 36
4.5.4 Complexity ................................................................................................................ 37
5. Implications For Business Practice ....................................................................................... 38
5.1.1 Electronic Mail .......................................................................................................... 38
5.1.2 Personnel Turnover and Work Teams......................................................................... 39
6. Contributions, Limitations, and Future Research ................................................................ 41
6.1 .Research Contributions .......................................................................................................... 41 6.2 .Research Limitations ............................................................................................................. 41
6.3 .Future Research ..................................................................................................................... 42
References ....................................................................................................................................... 44
Appendix A: Experiment Information Requests and Model Answers .......................................... 47
Appendix B: Experiment Instruction Sheet ................................................................................... 52
Appendix C: Command Interpreter Unix Shell Script .................................................................. 58
Appendix D: Experiment Entity-Relationship Diagram ............................................................... 65
Appendix E: Experimental Design ................................................................................................ 68
iv
Appendix F: Error Marking Sheets ............................................................................................... 72
Appendix G: Annotated Corrected Participant Response ............................................................. 75
Appendix H: Pearson Correlation Matrix of Variables ................................................................ 77
Appendix I: Analysis of Ambiguity's Effect On Error Type ........................................................ 78
Appendix J: Seven Ambiguity Types Question Assessment Ratings ............................................ 84
Appendix K: Ambiguity Assessment Instrument .......................................................................... 85
Appendix L: Internal Validity of the Experiment ........................................................................ 94
v
Figures
Figure 1 Types of Ambiguity (adapted from Walton 1996) 7
Figure 2 The Theoretical Model of Ambiguity, Complexity, and End User Query Performance 19
Figure 3 Depicting graphically the relationship between the treatment received (ambiguous or
clear information request) and the total errors in the participant's response.
25
Figure 4 Depicting graphically the relationship between the treatment received (ambiguous or
clear information request) and the duration taken for the participant to prepare the
response.
26
Figure 5 Depicting graphically the relationship between the treatment received (ambiguous or clear information request) and the participant's confidence in the response.
26
Tables
Table 1 Summary and Examples of the Seven Types of Ambiguity in Natural Language Information Requests
17
Table 2 Participant Demographic Information and Descriptive Statistics: Course Background
of Group A and Group B
23
Table 3 Participant Demographic Information and Descriptive Statistics: Academic Record of
Group A and Group B
23
Table 4 Participant Demographic Information and Descriptive Statistics: Participant Age in
Group A and Group B
24
Table 5 Comparative Statistics for all Participant Responses Grouped by Question (Q) and
Treatment (T). Note that for T, a = ambiguous, c = clear
25
Table 6 Confidence Rating Transformation to a Numerical Scale 28
Table 7 Ambiguity Assessment Scale for the analysis of the Seven Ambiguity Types Regression Model
29
Table 8 Regression Analysis Results for the General Ambiguity Regression Model 30
Table 9 Regression Analysis Results for the Seven Ambiguity Types Regression Model 31
Table 10 Summary of Analysis' Support for Hypotheses 32
Table 11 Participant Strata Classes 69
1
1. Introduction
Keen (1993) predicts that innovative applications of information technology will change the
competitive landscape to such an extent that fifty percent of companies in some industries
may not survive the next decade. This rise of the importance of information technology
innovation and application has lead to the increased need for relevant, timely information at
the point where that information is used and understood (Conger 1994; Delligatta and
Umbaugh 1994; Nath and Lederer 1996).
The demand for information system (IS) professionals vastly overwhelms the available
supply for both now and the foreseeable future (Freeman et al. 2000; Rosenthal and
Jategaonkar 1995; Australian Bureau of Statistics 1997). Hence, the use of computerised
information systems by end users has become compulsory in most business organisations
(Cardinali 1992; Athey and Wickham 1995-1996). To provide appropriate, relevant
information requires identifying and eliminating ambiguities in communication between the
stakeholders or managers requesting information, and the end users querying the information
systems.
Traditional structured methodologies reduce ambiguity at the expense of timeliness,
flexibility, and learning. The insights that end users can achieve during interactive, iterative
query sessions are also of benefit. The need for timeliness, flexibility, learning and end user
insights, as well as the shortage of IS professionals, have lead to the general decline of
structured reports (Ryan 1993). The use of ad hoc and iterative end user reports has
increased (Tayntor 1994). Nonetheless, many end users now use more formalised processes
in developing their reports than previously (Conger 1994; Tayntor 1994).
2
Information request ambiguity has potentially real and large impacts on business
organisations. An ambiguous information request can result in a report that, although it
appears acceptable to the person making the information request, does not contain the desired
information. If that wrong report is then used to make business decisions that the correct
report would not have supported, then information request ambiguity can cause substantial
negative impacts.
This paper develops a theory of the impact of ambiguity in information requests on end user
query performance, and tests that theory empirically. It empirically examines the strength
and direction of the relationships between ambiguity types (lexical, syntactical, inflective,
pragmatic, extraneous, emphatic, and suggestive), complexity, and end user query
performance. The current study extends previous work (Suh and Jenkins 1992; Borthick et
al. 1997; Rho and March 1997; Borthick et al. 2000) and builds upon the theory of end users'
query performance in the tradition of Dubin (1978).
3
2. Information Request Ambiguity and End User Query Performance
Different forms of ambiguity can be present in a natural language information request. The
primary aim of this research is to explore the impact of ambiguity on end user query
performance. This chapter develops a theory of the relationship between information request
ambiguity and end user query performance.
2.1 A Theoretical Model of Information Request Ambiguity
The development of an accurate SQL query by an end user depends on the user's knowledge
of the information needed, the database structure, and the query language (Ogden et al. 1986).
A lack of skill in any of these three domains will lead to inaccurate SQL queries (Ogden et al.
1986).
A natural language information request requires end users to transform the natural language
constructs into the query components consisting of lexical items (Katzeff 1990). End users
must conceptualise the information requirement and then mentally map this conceptualisation
to their understanding of the database structure. Reisner (1977) proposed a template model
for the manner in which users create SQL queries from a natural language information
request. Each query's operator components (Halstead 1977) are drawn from a set of known
query language components to address the requirements of the natural language information
request.
Ambiguity affects the user's interpretation of the information needed. Because information
requests are expressed using a natural language, they are ambiguous and uncertain. End users
4
must interpret and analyse the information requests to develop queries that meet the
requestors' needs. The end users' uncertainty in determining the required response affects the
required cognitive effort because multiple interpretations of the actual information required
may be legitimately constructed (Almuallim et al. 1997).
The impact of natural language's seven types of ambiguity has not previously been examined
in the context of end user query performance. These seven types of ambiguity are lexical,
syntactical, inflective, pragmatic, extraneous, emphatic and suggestive (Walton 1996; Fowler
and Aaron 1998). These ambiguities affect the number of legitimate interpretations of the
natural language statement of the information request. The information request has
"multiplicity of meaning" (Walton 1996).
Tasks that are more complex require increased cognitive effort (Campbell 1988). In the
context of database queries, task complexity generally negatively impacts end user query
performance (Borthick et al. 1997; Borthick et al. 2000). Task complexity is included in this
research to control for complexity's established impact on end user query performance.
Query performance can be measured on a number of dimensions including correctness, time
required, and confidence.
Hence, the following hypotheses are proposed:
H1a: Higher ambiguity in the information request leads to an increase in the total errors
in the query formulation.
H1b: Higher ambiguity in the information request leads to an increase in the time taken
to complete the query formulation.
5
H1c: Higher ambiguity in the information request leads to lower end user confidence in
the accuracy of the query formulation.
2.2 The Nature of Ambiguity
Ambiguity is an inherent property of all natural languages, including English (Jespersen
1922; Williamson 1994). Absolute precision of a language is pragmatically undesirable,
because the language is unable to adapt to new concepts (Williamson 1994). The
communication needed to ensure effective and efficient report production, however, requires
complete clarity. Hence, a tension exists between the natural language's need for flexibility
in the long term and the need for precision in the short term. Natural language is at once both
dysfunctional and poorly adapted to the functions language needs to perform, yet flexible and
broad-based such that it is useable in practice (Chomsky 1990).
Interest in linguistic ambiguity has an extensive history, and has been recognised as a
separate branch of study since at least Aristotle's time (Kooij 1971). Aristotle noted that
language must be ambiguous, as a language has limited words but an infinite number of
things and concepts to which those words must apply (Kooij 1971).
Russell (1923) recognised that all natural languages are vague and ambiguous. Excluding the
realm of mathematical symbolism, constructing completely unambiguous expressions is not
possible with the syntax and vocabulary tools available within natural languages (Williamson
1994). To endure and survive, language requires the flexibility to communicate new
concepts. Ambiguity necessarily derives from the flexibility of natural language.
6
Kooij (1971) states that ambiguity arises where a sentence can be interpreted in more than
one way. Similarly, Walton (1996) considers a sentence or statement to be more ambiguous
as the number of legitimate interpretations of the sentence (or paragraph) increase.
Ambiguity implies multiplicity of meaning (Walton 1996).
In classical analysis, the multiplex (Latin for "multiple meaning") categorisation of
Alexander of Aphrodisius (Hamblin 1970) suggests a basis for the identification of categories
of ambiguity. In classical literature, Alexander of Aphrodisius identified three categories of
ambiguity: potential, actual, and imaginary. Walton (1996) adapts this classical multiplex
categorisation to his identified types of ambiguity.
Walton (1996) identifies six classical types of ambiguity in natural language: lexical,
syntactical, inflective, pragmatic, emphatic, and suggestive. In addition to Walton's (1996)
taxonomy, extraneous information and noise in the communication can also be a source of
ambiguity. Extraneous ambiguity arises where the communication is not parsimonious, or
the communication includes information that is not directly relevant to the message being
communicated (Fowler and Aaron 1998). Extraneous ambiguity is an actual ambiguity
within the Walton (1996) taxonomy.
Each ambiguity type can be independently present within the communication. Walton's
(1996) modified taxonomy and model of ambiguity is presented in Figure 1.
7
Ambiguity
SuggestiveEmphaticPragmaticInflective
Syntactical
Lexical
ImaginaryActualPotentialMultiplex
Categories of
Ambiguity
Types of
AmbiguityExtraneous
Figure 1
Types of Ambiguity (adapted from Walton 1996)
2.2.1 Potential Ambiguity
Potential ambiguity arises when a term or a sentence is ambiguous in and of itself, for
example, before its use in the context of a sentence or paragraph. Three types of ambiguity
are categorised as potential ambiguity: lexical, syntactical, and inflective.
Lexical Ambiguity
Lexical ambiguity is the most commonly known form of ambiguity (Reilly 1991; Walton
1996). It occurs when words have more than one meaning as commonly defined and
understood. Considerable potential ambiguity arises when a word with various meanings is
used in a statement of information request. For example, "bank" may variously mean the
"bank" of a river (noun), to "bank" as related to aeroplane or a roller-coaster (verb), a savings
"bank" (noun), to "bank" money (verb), or a "bank" of computer terminals (noun) (Turner
1987). Lexical ambiguity is often reduced or mitigated by the context of the sentence.
In the case of an information request, lexical ambiguity exists in the statement "A report of
our clients for our marketing brochure mail-out". The word "report" may have several
8
meanings, independent of its context. A gunshot report may echo across the hillside. A
student can report to the lecturer. A heavy report can be dropped on the foot. Although the
context may make the meaning clear, the lexical ambiguity contributes to the overall
ambiguity of the statement and increases cognitive effort.
The following hypotheses are proposed:
H2a: Higher lexical ambiguity in the information request leads to an increase in the total
errors in the query formulation.
H2b: Higher lexical ambiguity in the information request leads to an increase in the time
taken to complete the query formulation.
H2c: Higher lexical ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
Syntactical Ambiguity
Syntactical ambiguity is a structural or grammatical ambiguity of a whole sentence that
occurs in a sub-part of a sentence (Reilly 1991; Walton 1996). Syntactical ambiguity is a
grammatical construct, and results from the difficulty of applying universal grammatical laws
to sentence structure. An example of syntactical ambiguity is "Bob hit the man with the
stick". This phrasing is unclear as to whether a man was hit with a stick, or whether a man
with a stick was struck by Bob. The context can substantially reduce syntactical ambiguity.
For example, knowing that either Bob, or the man, but not both, had a stick resolves the
syntactical ambiguity.
9
Comparing the phrase "Bob hit the man with the stick" to the analogous "Bob hit the man
with the scar" provides some insights. As a scar is little suited to physical, violent use, the
latter formulation clearly conveys that the man with the scar was struck by Bob (Kooij 1971).
In the case of an information request, syntactical ambiguity exists in the request "A report of
poor-paying clients and client managers. Determine their effect on our profitability for the
last twelve months." The request is syntactically ambiguous because the end user can
interpret "their" to mean the poor paying clients, the client managers, or both. Although the
context may reduce or negate the ambiguity, syntactically the request is ambiguous.
The following hypotheses are proposed:
H3a: Higher syntactical ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H3b: Higher syntactical ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H3c: Higher syntactical ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
Inflective Ambiguity
As Walton (1996) notes, inflective ambiguity is a composite ambiguity, containing elements
of both lexical and syntactical ambiguity. Like syntactical ambiguity, inflective ambiguity is
grammatical in nature. Inflection arises where a word is used more than once in a sentence or
paragraph, but with different meanings each time (Walton 1996). An example of inflective
10
ambiguity is to use the word "scheme" with two different meanings in the fallacious
argument, "Bob has devised a scheme to save costs by recycling paper. Therefore, Bob is a
schemer, and should not be trusted" (Ryle 1971; Walton 1996).
In the case of an information request, inflective ambiguity exists in the example, "A report
showing the product of our marketing campaign for our accounting software product".
Ambiguity derives from using the word "product" in two different senses in the one statement
(Walton 1996; Fowler and Aaron 1998).
The following hypotheses are proposed:
H4a: Higher inflective ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H4b: Higher inflective ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H4c: Higher inflective ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
2.2.2 Actual Ambiguity
Actual ambiguity refers to ambiguity that occurs in the act of speaking. It arises when a word
or phrase, without variation either in itself or in the way the word is put forward, has different
meanings. The statement does not contain adequate information to resolve the ambiguity,
resulting in a number of legitimate interpretations. Two distinct types of ambiguity are
categorised as actual ambiguity: pragmatic and extraneous.
11
Pragmatic Ambiguity
Pragmatic ambiguity arises when the statement is not specific, and the context does not
provide the information needed to clarify the statement. Information is missing, and must be
inferred. An example of pragmatic ambiguity is the story of King Croesus and the Oracle of
Delphi (adapted from Copi and Cohen 1990):
"King Croesus consulted the Oracle of Delphi before warring with Cyrus of
Persia. The Oracle replied that, "If Croesus went to war with Cyrus, he would
destroy a mighty kingdom". Delighted, Croesus attacked Persia, and Croesus'
army and kingdom were crushed. Croesus complained bitterly to the Oracle's
priests, who replied that the Oracle had been entirely right. By going to war with
Persia, Croesus had destroyed a mighty kingdom - his own."
Pragmatic ambiguity arises when the statement is not specific, and the context does not
provide the information needed to clarify the statement (Walton 1996). The information
necessary to clearly understand the message is omitted. Due to the need to infer the missing
information, pragmatically ambiguous statements have multiple possible interpretations
(Walton 1996). Croesus interpreted the Oracle's statement as indicating his success in battle -
the response he desired. As noted by Hamblin (1970), Croesus' logical response to the
oracular reply would have been to immediately ask the Oracle, "Which kingdom?" Further
information is needed to resolve pragmatic ambiguity.
In the case of an information request, pragmatic ambiguity exists in the request for "A report
of all the clients for a department." The ambiguity is that the request does not refer to a
specific department. The end user could legitimately prepare a report for any department.
Further information is needed to resolve this actual ambiguity in this case.
12
The following hypotheses are proposed:
H5a: Higher pragmatic ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H5b: Higher pragmatic ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H5c: Higher pragmatic ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
Extraneous Ambiguity
In contrast to pragmatic ambiguity, in which information necessary to clearly understand the
message is omitted, extraneous ambiguity arises from an excess of information. Clearer
communication arises where the minimally sufficient words needed to convey the message of
the statement are used (Fowler and Aaron 1998). Where more words are used than
necessary, or where unnecessary detail is provided in the communication that is not part of
the message, ambiguity arises. The excess detail obscures the essential message and
contributes to different emphases or interpretations.
The use of passive voice, vacuous words, or the repetition of phrases with the same meaning
all contribute to lack of clarity (Fowler and Aaron 1998). The use of clichés and the over-use
of figures of speech add volume to the statement, but add little or no meaning. Pretentious
and indirect writing also adds to the bulk of the statement, but without adding meaning.
Fowler and Aaron (1998) provide the following comparative example:
13
Pretentious: To perpetuate our endeavour of providing funds for our elderly citizens as
we do at the present moment, we will face the exigency of enhanced
contributions from all our citizens.
Revised: We cannot continue to fund Social Security and Medicare for the elderly
unless we raise taxes.
The extra volume contributes to vagueness in the first statement, and adds to the multiplicity
of legitimate interpretations of the statement. The first statement exhibits extraneous
ambiguity. The second statement communicates forcefully and concisely.
An example of extraneous ambiguity in an information request is "A report of all clients (and
their names and addresses only) for the Tax and Business Services department. Some of
those clients are our biggest earners, you know". The last sentence is extraneous, and
contains detail that is redundant, uninformative, or misleading relative to the fundamental
message. In information theoretic terms, extraneous ambiguity is "noise" in the
communication (Axley 1984; Eisenberg and Phillips 1991; Severin and Tankard 1997).
The following hypotheses are proposed:
H6a: Higher extraneous ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H6b: Higher extraneous ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H6c: Higher extraneous ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
14
2.2.3 Imaginary Ambiguity
Imaginary ambiguity occurs when a word with a fixed meaning seems to have a different one.
Imaginary ambiguity derives from the optional interpretation that the recipient of the
communication places on the information received. Two distinct types of ambiguity can be
categorised as imaginary ambiguity: emphatic and suggestive.
Emphatic Ambiguity
The question of ambiguity deriving from accent, or emphasis in speaking, is an ancient one
(Hamblin 1970). When a phrasing is rendered in the written form, the verbal emphasis may
only be crudely indicated. Significant meaning and context is lost. Rescher (1964) provides
the following example of emphatic ambiguity:
The intended meaning of the democratic credo "Men were created equal" can be
altered by stressing the word "created" (implying "that's how men started out, but
they are no longer so").
The verbal emphasis creates an inference of meaning that is a legitimate interpretation of the
phrasing. That is, changes in intonation can yield different interpretations.
In the case of an information request, emphatic ambiguity occurs in the example information
request of "A report of our good clients". Ambiguity can derive from placing different
emphases on the words. Depending on the context or on emphasis used, "good clients" could
be legitimately interpreted to be clients that pay on time or clients that have the highest
dollar-value sales. Indeed, with an ironic emphasis on the word "good", this request could be
interpreted as a list of our worst clients - those that do not pay. The information necessary to
resolve the ambiguity is often difficult to convey using only printed media.
15
The following hypotheses are proposed:
H7a: Higher emphatic ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H7b: Higher emphatic ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H7c: Higher emphatic ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
Suggestive Ambiguity
Despite the apparent clarity of the sentence in question, suggestive ambiguity creates diverse
implications and innuendos that can produce different implications (Walton 1996). Fischer
(1970) provides an example:
The First Mate of a ship docked in China returned drunk from shore leave, and
was unable to write up the ship's log. The displeased Captain completed the log,
adding, "The Mate was drunk all day". The next day, the now-sober Mate
challenged the Captain over the entry, as it would reflect poorly on him. The
Captain responded that the comment was true, and must stand. Whereupon the
mate added to that day's log, "The Captain was sober all day". In reply to the
Captain's challenge, the mate responded "the comment is true, and must stand"
(derived from Trow 1905, pp 14-15).
The phrase "The Captain was sober all day" contains suggestive ambiguity. As a further
example, the statement, "The President is now an honest man", is perfectly clear, and yet
considerable innuendo exists. The fact that the President's current honesty is worthy of
comment implies that the President was previously dishonest.
16
Both phrases are perfectly clear, and, indeed, true. However, considerable innuendo exists.
The fact that the Captain's sobriety, or the President's honesty, is singled out for special
comment implies that such a state of affairs is unusual (Walton 1996). The statements are
suggestively ambiguous.
In the case of an information request, an example of this ambiguity is, "A report of the clients
of this accounting practice that have lodged taxation returns in the past five years in
accordance with the requirements of the Australian Taxation Office". The request for
information is quite clear. By definition, however, all taxation returns should be lodged in
accordance with the Australian Taxation Office's requirements. The extra phrase introduces
suggestive ambiguity into the information request by suggesting that the report will not
consist of all taxation clients, because some clients may not have complied with the Tax
Office's requirements.
The following hypotheses are proposed:
H8a: Higher suggestive ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H8b: Higher suggestive ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H8c: Higher suggestive ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
17
2.2.4 Ambiguity in Practice
Table 1 provides examples of the types of ambiguity identified in this paper. The table also
summarises, and provides examples for, each type of ambiguity.
Table 1
Summary and Examples of the Seven Types of Ambiguity in Natural Language Information Requests
Ambiguity
Type
Information Request
Lexical A report of our clients for our marketing brochure mail-out.
The word "report" may have several meanings, independent of its context.
For example, there may be: a gunshot report echoing through the hillside;
the Lieutenant reported to the Captain; I dropped the heavy report on my toe,
etc. Although the context may make the meaning clear, the lexical ambiguity adds to cognitive effort and contributes to ambiguity overall.
Syntactical A report of poor-paying clients and client managers. Determine their effect
on our profitability for the last twelve months.
It is not clear whose effect on profitability is meant. Another example is
"Bob hit the man with a stick". It is not clear, syntactically, whether the man
with a stick was hit, or whether the man was hit, by Bob, with a stick.
Inflective A report showing what the product of our last marketing campaign for sales
of our accounting software product in the last month was.
Ambiguity here derives from the use of the word "product" with two
different meanings in the one information request.
Pragmatic A report of all the clients for a department.
The ambiguity here is that the department has not been specified.
Information necessary to clearly understand the message is omitted. It would
be legitimate to prepare a report for any department. Further information is
needed to resolve this actual ambiguity.
Extraneous A report of all clients (and their names and addresses only) for the Tax and
Business Services department. Some of those clients are our biggest earners, you know.
The last sentence is extraneous. Unlike pragmatic ambiguity, the sentence
contains information that is redundant, uninformative, or not necessary to
derive the statement's message. "Noise" in the communication exists. More words are used than are necessary to make the statement.
Emphatic A report of our good clients.
Ambiguity here could derive from the lack of ability to provide emphasis of
the words in its written form. Depending on the emphasis used, "good
clients" could be legitimately interpreted to be clients that pay on time,
clients that have the most dollar-value sales, or even, with the correct ironic emphasis on the spoken word, our worst clients - those that do not pay.
18
Ambiguity
Type
Information Request
Suggestive A report of the clients of this accounting practice that have lodged taxation
returns in the past five years in accordance with the requirements of the
Australian Taxation Office.
The request for information is quite clear until the phrase "in accordance
with the requirements of the Australian Taxation Office". By definition, all
taxation returns should be lodged in accordance with these requirements.
The extra phrase introduces suggestive ambiguity into the information
request by suggesting that the report will not necessarily consist of all
taxation clients.
2.3 Task Complexity
More complex tasks require more cognitive effort and hence have a generally negative
impact on the user's performance in deriving database queries (Campbell 1988; Borthick et
al. 1997; Borthick et al. 2000). Task complexity, in the context of query development,
consists of the inherent task complexity associated with the query syntax, and the data
structure complexity associated with the organisation of the tables and attributes (Liew 1995).
Campbell (1988) and Wood (1986) document the general impact of task complexity. Jih et
al. (1989) studied task complexity and user performance in the context of the use of entity-
relationship diagrams and relational data models. Complexity in this context is generally
measured as a function of the total number of elementary mental discriminations required to
write a query (Halstead 1977).
The following hypotheses are proposed:
H9a: Higher complexity in the information request leads to more total errors in the query
formulation.
19
H9b: Higher complexity in the information request leads to more time taken to complete
the query formulation.
H9c: Higher complexity in the information request leads to lower end user confidence in
the accuracy of the query formulation.
2.4 Theoretical Model Summary
Figure 2 summarises the theoretical model presented in this paper. Complexity and the seven
types of ambiguity have a negative impact on end user query performance as they increase.
Hypotheses 1 through 9 are derived from these hypothesised relationships.
Pragmatic
Extraneous
Lexical
Syntactical
Inflective
Emphatic
Suggestive
Ambiguity
Information
Request
Complexity
End User
Query
Performance
Negative
Relationship With
Negative
Relationship With
Figure 2
The Theoretical Model of Ambiguity, Complexity, and End User Query Performance
20
3. Methodology
3.1 Experimental Design
A laboratory experiment was conducted to test the hypotheses presented in this study. A two-
factor, within-groups experimental design was used (Huck et al. 1974). Participants were
randomly assigned to two groups (Group A and Group B). Each participant was presented
with up to sixteen questions. Each question was presented in either a clear or ambiguous
formulation.
Group A's question formulations were alternately ambiguous and clear. Group B's question
formulations were alternately clear and ambiguous. Using alternating formulations helped
promote equitable treatment of the two groups. That is, the alternating formulations ensured
that both groups would complete approximately the same number of questions during the
allotted time, expend approximately the same amount of cognitive effort, and would
experience approximately the same level of frustration in dealing with ambiguous
information requests. All participants spent two hours on the experiment. Appendix A
shows the questions presented to students together with the model answers.
A set of instructions (Appendix B), including a synopsis of the query language syntax, was
provided to the participants. A Unix shell script (Appendix C) presented the questions
electronically to the participants and automatically captured their responses in text files. An
entity-relationship diagram describing the database is presented in Appendix D, and was
available to subjects. Further details regarding the experimental process are provided in
Appendix E.
21
3.2 Experiment Participants
Forty-seven undergraduate and nineteen postgraduate students participated in the experiment.
Participating students were enrolled either in an advanced undergraduate or in a post-graduate
database subject within the business school at the University of Queensland. All students
enrolled in the two database subjects participated in the experiment.
The motivation for student participation was the receipt of five percent of the students' final
mark for the subject (2.5% for participation, 2.5% for performance). Participants were aware
that they were participating in an experiment.
Participants had been previously trained in the use of the SQL query language, and had been
afforded the opportunity to practice SQL on the university systems. All practice took place
on different databases than used for the experiment. Generally, student expertise with SQL
was low to intermediate. The experiment, for most students, was the first practical
application of their SQL skills.
3.3 Assessment of Participant Responses
Participant responses were captured in text files that showed each interactive response and
captured the start and end time of each question. This file was edited into a suitable format
for marking by two examiners. Each response was independently assessed by each examiner
to determine whether the response was the participant's final complete response. Responses
where participants did not finish the query formulation were removed from the study.
22
In some instances, the state of completion of the response was indeterminate. If the response
could only be corrected with substantial rework of the submitted response, the examiners
erred on the side of caution and removed these responses from the study.
Examiners then corrected the answers according to the model answers (Appendix A), using
the Semantic Error Counting, SQL Challenge Error Counting, and Intermediate Error
Counting Forms shown in Appendix F. Each examiner independently assessed the
participant responses and corrected the response. Each discrete alteration (addition or
deletion of a query component) counted as one "micro error" in the Semantic Error Counting
Form (Appendix F).
The corrected response that determined the total error count was the response that required
the fewest changes to the participant's response, and still produced the required result set.
This approach ensured a lower error count than a strict modification of the response to ensure
an exact match to the model answer. Appendix G provides an example corrected response.
The examiners then compared their independent assessments to ensure that all errors had
been found and corrected and that the proposed formulations or corrected formulations
produced the correct output. If more than one correction method was found to produce a
correct query, the correction method that produced the smallest number of errors was used.
A diary of common errors and their corrections was kept to ensure consistency throughout the
assessment process. The final, moderated, error sheets were transcribed to a relational
database for analysis.
23
4. Results and Discussion
4.1 Overview of Experimental Results
Participant demographic information and statistics are presented in Tables 2, 3, and 4. The
demographic information indicates that the assignment of participants to ensure homogeneity
between Group A and Group B was successful. The groups are relatively homogeneous in
terms of course background, grade point average (GPA), and age. In any case, both Group A
and Group B received the treatment effect of ambiguity on alternate questions, mitigating
concerns of the effect of a selection bias on experimental results.
Table 2
Participant Demographic Information and Descriptive Statistics: Course Background of Group A and Group B
Enrolled Degree Group
A
Group
B
Total
Undergraduate Arts 3 3 6
Undergraduate Business 20 18 38
Undergraduate Computer Science/Information systems 3 0 3
Postgraduate Business 2 1 3
Postgraduate Computer Science/Information Systems 5 11 16
Total Participants: 33 33 66
Table 3 Participant Demographic Information and Descriptive Statistics:
Academic Record of Group A and Group B
Academic Record Average Standard
Deviation
Min Max
GPA (65 students with academic
records)
4.94 0.90
3.26 7.00
GPA (Group A: 33 students with
academic records)
5.04 0.83 3.26 6.84
GPA (Group B: 32 students with
academic records)
4.83 0.97 3.29 7.00
24
Table 4
Participant Demographic Information and Descriptive Statistics:
Participant Age in Group A and Group B
Age (in Years) Average Standard
Deviation
Min Max
Average Age
(65 Students with date of birth
available)
24.94 7.72 18.74 61.25
Average Age
(Group A, 33 Students with date
of birth available)
24.76 7.29 19.50 48.53
Average Age
(Group B, 32 Students with date
of birth available)
25.13 8.26 18.74 61.25
Participants completed 425 responses in the experiment. The experiment contained sixteen
questions for both ambiguous and clear information requests. Due to the two hour time
constraint no participant completed more than twelve questions. Forty participants (60.61%
of the sample population) completed six questions. On average, participants completed 6.44
questions, with a standard deviation of 1.75.
Table 5 provides an overview of the participants' results in the experiment. Total errors is
calculated as the average of the micro errors counted using the Semantic Error Counting
Sheet shown in Appendix F. Appendix H provides a Pearson correlation matrix of the
dependent and independent variables measured in the experiment. Appendix I provides
detailed reports of the errors participants made on each individual question.
25
Table 5
Comparative Statistics for all Participant Responses
Grouped by Question (Q) and Treatment (T). Note that for T, a = ambiguous, c = clear Q T Halstead's
Complexity
Group Response
Count
Attempts
Average
Attempts
Standard
Deviation
Confidence
Average
Confidence
Standard
Deviation
Duration
Average
Duration
Standard
Deviation
Total Errors
Average
Total Errors
Standard
Deviation
1 a 1.6927 A 32 3.31 1.99 6.22 1.36 10.51 4.63 1.59 3.66
1 c 1.6927 B 33 3.18 2.16 6.42 0.87 11.63 6.60 1.12 2.48
2 a 5.4186 B 33 9.21 8.88 5.21 1.47 20.74 11.30 4.27 8.18
2 c 5.4186 A 33 3.61 3.43 6.30 1.05 9.03 6.89 0.30 0.81
3 a 6.8908 A 33 7.94 6.04 5.91 1.57 11.84 7.72 3.97 3.50
3 c 6.8908 B 33 5.09 6.18 6.27 1.42 8.63 5.29 1.03 2.86
4 a 4.4697 B 32 7.31 4.75 5.38 1.64 15.57 8.95 4.03 5.54
4 c 4.4697 A 33 6.52 7.36 6.21 1.47 10.95 8.46 0.67 2.23
5 a 12.2917 A 33 9.24 6.63 5.24 2.21 18.54 11.06 9.42 10.39
5 c 12.2917 B 30 7.07 5.98 5.37 2.16 15.65 9.74 5.20 7.70
6 a 18.8000 B 17 11.41 7.21 5.59 1.33 23.59 7.93 32.94 13.21
6 c 18.8000 A 23 14.91 9.36 4.87 1.91 25.63 10.13 8.00 10.49
7 a 16.0076 A 15 11.07 6.10 5.07 1.49 18.78 5.46 7.27 8.65
7 c 16.0076 B 15 7.67 4.20 5.07 1.98 15.31 7.86 6.13 7.41
8 a 16.2684 B 6 6.83 8.42 5.83 1.60 13.24 8.36 2.33 4.08
8 c 16.2684 A 10 6.40 2.46 5.00 1.94 12.53 5.35 6.40 6.52
9 a 23.8970 A 3 12.33 2.08 3.00 1.73 16.43 7.77 18.00 10.54
9 c 23.8970 B 2 6.50 3.54 6.50 0.71 15.36 2.51 15.50 21.92
10 a 19.4819 B 1 7.00 - 5.00 - 9.93 - 20.00 -
10 c 19.4819 A 4 7.25 3.20 4.25 2.50 9.56 1.40 5.00 2.58
11 a 22.4000 A 2 7.00 4.24 5.00 2.83 8.53 2.13 22.50 13.44
11 c 22.4000 B 1 4.00 - 7.00 - 9.45 - 8.00 -
12 c 29.1633 B 1 14.00 - 4.00 - 10.10 - 8.00 -
The relationships between the dependent variables (duration, confidence, and total errors) and
the independent variables (complexity, ambiguity) are graphically depicted in Figures 3, 4,
and 5. These figures illustrate that the hypothesised relationships for complexity and
ambiguity were supported for most measures by most queries.
Questions by Treatment and Error
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
1 2 3 4 5 6 7 8 9 10 11 12
Question
Avera
ge E
rro
rs
Ambiguous
Clear
Figure 3
Depicting graphically the relationship between the treatment received (ambiguous or clear information request)
and the total errors in the participant's response.
26
Questions by Treatment and Duration
0.00
5.00
10.00
15.00
20.00
25.00
30.00
1 2 3 4 5 6 7 8 9 10 11 12
Question
Avera
ge D
ura
tio
n
(in
min
ute
s)
Ambiguous
Clear
Figure 4
Depicting graphically the relationship between the treatment received (ambiguous or clear information request)
and the duration taken for the participant to prepare the response.
Questions by Treatment and Confidence
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
1 2 3 4 5 6 7 8 9 10 11 12
Question
Avera
ge C
on
fid
en
ce
Ambiguous
Clear
Figure 5
Depicting graphically the relationship between the treatment received (ambiguous or clear
information request) and the participant's confidence in the response.
Question Six, with an average of 32.94 errors (standrard deviation of 13.21), caused the most
problems for participants in its ambiguous formulation. Nonetheless the seventeen
respondents to Question Six in its ambiguous formulation took on average slightly less time
to complete the response (23.59 average minutes, 7.93 standard deviation) than the twenty-
three respondents for the clear formulation (25.63 average minutes, 10.13 standard
deviation).
27
Participants that completed Question Eight in the clear formulation made more average errors
(6.40, standard deviation of 6.52) than those with the ambiguous formulation (average of 2.33
and standard deviation of 4.08). Participants also exhibited higher average confidence ratings
for the ambiguous formulation of this question (5.83, standard deviation of 1.60) than
participants receiving the clear formulation (5.00, standard deviation of 1.94).
A reason for these results may be that extraneous ambiguity is apparent in the clear
formulation due to the formulation's length. Question Eight had sixteen completed responses
(six respondents for the ambiguous formulation, ten respondents for the clear formulation),
however, which limits the weight that can be placed on this question's result. Because of the
small number of participants completing Questions Nine through Twelve, analysis of
differences in these individual questions is not appropriate.
4.2 Regression Analysis
Two multiple linear regression models were used to analyse the experimental results. The
model used to test H1a-c, and H9a-c for the effects of ambiguity and complexity respectively
was:
(1) Performance = Ambiguity + Complexity
where ambiguity was a dichotomous variable and complexity was measured using the
Halstead (1977) complexity measure for difficulty.
28
The model used to test the seven individual types of ambiguity in H2a-c to H8a-c was:
(2) Performance = Lexical + Syntactical + Inflective + Pragmatic +
Extraneous + Emphatic + Suggestive + Complexity
where the ambiguity types were measured as shown in Appendix J, according to the
ambiguity assessment instrument presented in Appendix K.
Performance is end user query performance. The dependent variables that proxy for end user
query performance are total errors, duration, and confidence. Duration was measured as
decimal minutes. The Confidence Rating was self-assessed by participants and was
transformed to a numerical rating in accordance with Table 6. The numerical rating was used
as the measure for confidence in the regression analysis.
Table 6
Confidence Rating Transformation to a Numerical Scale
Confidence Rating Numerical Rating
>85-100% 7
70-85% 6
55-70% 5
40-55% 4
25-40% 3
10-25% 2
<10% 1
In all regression models, the Halstead (1977) complexity measure for difficulty was used to
assess the complexity of the required model answer. This measure has been used in several
end user query performance studies (Jih et al. 1989).
For testing H1a-c and H9a-c, a dichotomous variable of 0 (clear formulation, or pseudo-SQL)
and 1 (ambiguous formulation, or manager-English) was used to indicate whether the
29
participant had received a clear formulation or an ambiguous formulation of the information
request. For testing H2a-c to H8a-c, the seven independent ambiguity parameters were
assessed in accordance with the scale presented in Table 7. Each question was assessed by
two independent non-researchers who had been briefed in the definitions of the seven types
of ambiguity. The initial scores were moderated by discussion and consideration between the
independent third parties and the researcher to ensure consistent and correct interpretation of
the seven ambiguity definitions. Cronbach's alpha (Cronbach 1951) for the two third parties'
ambiguity measurement scores was 0.6887, indicating that a moderately reliable measure for
ambiguity across two researchers was achieved.
Table 7
Ambiguity Assessment Scale for the analysis of the Seven Ambiguity Types Regression Model
Ambiguity Assessment Rating Meaning
0 No ambiguity of this type present
1 A little ambiguity of this type present
2 Some ambiguity of this type present
3 Much ambiguity of this type present
4 A great deal of ambiguity of this type present
Each question formulation, clear and ambiguous, for each information request was assessed
to provide a scale of ambiguity. The instrument used to undertake this finer assessment of
ambiguity for questions for which responses exist is reproduced in Appendix K. Using a five
point scale for the ambiguity assessment rating provides a finer measure than would a
dichotomous variable.
4.3 Ambiguity Treatment Multiple Linear Regression Model Results
Table 8 provides the results of the multiple linear regression (Newbold 1984) shown for
model (1) for the Total Errors, Duration, and Confidence measures of end user query
performance. These results provide evidence regarding H1a-c and H9a-c. All relationships
30
are in the hypothesised direction (positive for H1a, H1b, H9a, and H9b, and negative for H1c
and H9c), and indicate strong support for each hypothesis.
Table 8
Regression Analysis Results for the General Ambiguity Regression Model
Source
(n=425)
DF Mean
Square
F-Value Pr > T
(2 tailed)
Parameter
Estimate
R2
Model (Total Errors) 2 5430.30 88.44 0.0001 0.2954
Error 422 61.40
Ambiguity (H1a) 1 2447.98 39.87 0.0001 4.8042
Complexity (H9a) 1 8705.38 141.78 0.0001 0.7582
Model (Duration) 2 2236.60 28.59 0.0001 0.1193
Error 422 78.23
Ambiguity (H1b) 1 1250.63 15.99 0.0001 3.4339
Complexity (H9b) 1 3352.81 42.86 0.0001 0.4705
Model (Confidence) 2 42.87 16.25 0.0001 0.0715
Error 422 2.64
Ambiguity (H1c) 1 13.03 4.94 0.0268 -0.3505
Complexity (H9c) 1 74.68 28.31 0.0001 -0.0702
Ambiguity in an information request has a strong impact on the three measures of end user
query performance presented in H1a, H1b, and H1c. Total errors, duration, and end user
confidence are significantly and strongly affected by the presence of ambiguity in the
information request. The result is confirmatory of the general hypothesis of the model
presented in this paper: that an ambiguous information request is likely to result in a query
formulation that is less accurate, takes longer to prepare, and in which the end user is less
confident. Ceteris paribus, a clearly formulated information request is more effective and
efficient than an information request that is ambiguous and poorly specified.
The relationship between ambiguity and end user confidence, however, is generally weaker
than expected, although still significant at the 5% level. The small R2 (0.0715) for the
confidence model indicates that the ambiguity and complexity of an information request had
little impact on each participant's confidence in their query formulation.
31
Ambiguity is significant for all three models. The R2 for each model (0.2954, 0.1193, and
0.0715) provides strong support for the assertion that ambiguity and complexity negatively
impact end user query performance.
4.4 Multiple Linear Regression Model: Seven Types of Ambiguity
Table 9 provides the results of the multiple linear regression model shown for model (2) for
the Total Errors, Duration, and Confidence measures of end user query performance. This
testing examines hypotheses H2a-c through H8a-c for individual types of ambiguity.
Table 9
Regression Analysis Results for the Seven Ambiguity Types Regression Model
Source
(n=425)
DF Mean
Square
F-Value Pr > T
(2 tailed)
Parameter
Estimate
R2
Model (Total Errors) 8 2177.52 46.81 0.0001 0.4737
Error 416 46.52
Lexical (H2a) 1 78.41 1.69 0.1949 -1.5545
Syntactical (H3a) 1 7.99 0.17 0.6789 -0.2274
Inflective (H4a) 1 0.79 0.02 0.8963 -0.4143
Pragmatic (H5a) 1 385.36 8.28 0.0042 1.2621
Extraneous (H6a) 1 254.77 5.48 0.0197 3.3940
Emphatic (H7a) 1 394.51 8.48 0.0038 2.6906
Suggestive (H8a) 1 167.54 3.60 0.0584 2.9079
Complexity 1 2605.34 56.01 0.0001 0.4899
Model (Duration) 8 832.24 11.23 0.0001 0.1776
Error 416 74.10
Lexical (H2b) 1 1272.66 17.17 0.0001 6.2626
Syntactical (H3b) 1 600.95 8.11 0.0046 1.9725
Inflective (H4b) 1 780.00 10.53 0.0013 -13.0021
Pragmatic (H5b) 1 4.65 0.06 0.8023 -0.1387
Extraneous (H6b) 1 1008.31 13.61 0.0003 6.7520
Emphatic (H7b) 1 129.85 1.75 0.1863 -1.5436
Suggestive (H8b) 1 457.05 6.17 0.0134 -4.8029
Complexity 1 1926.10 25.99 0.0001 0.4213
Model (Confidence) 8 14.66 5.64 0.0001 0.0978
Error 417 2.60
Lexical (H2c) 1 8.81 3.39 0.0664 -0.5211
Syntactical (H3c) 1 0.02 0.01 0.9292 -0.0115
Inflective (H4c) 1 1.27 0.49 0.4844 0.5253
Pragmatic (H5c) 1 2.83 1.09 0.2973 -0.1082
Extraneous (H6c) 1 0.10 0.04 0.8435 0.0677
Emphatic (H7c) 1 0.07 0.03 0.8697 -0.0358
Suggestive (H8c) 1 1.91 0.74 0.3915 0.3107
Complexity 1 76.02 29.24 0.0001 -0.0837
32
4.5 Summary of Results
The experimental results indicate that the taxonomy presented in this paper explains a great
deal of the effect of ambiguity on end user query performance. The results indicate that
further refinement of the theory presented in this paper is required. Table 10 provides a
summary of the results obtained in this experiment. All hypotheses indicated as "supported"
are significant at the p = 0.05 level or below according to a one-tailed test. The two-tailed p-
value is shown, and is immediately followed by the one-tailed p-value in brackets.
Table 10
Summary of Analysis' Support for Hypotheses
Hypothesis Statement Result
H1a Higher ambiguity in the information request leads to an
increase in the total errors in the query formulation.
Supported
p=0.0001 (0.0001)
H1b Higher ambiguity in the information request leads to an
increase in the time taken to complete the query formulation.
Supported
p=0.0001 (0.0001)
H1c Higher ambiguity in the information request leads to lower
end user confidence in the accuracy of the query formulation.
Supported
p=0.0268 (0.0134)
H2a Higher levels of lexical ambiguity in the information request
lead to more total errors in the query formulation.
Not Supported
p=0.1949 (0.0975)
(negative parameter)
H2b Higher levels of lexical ambiguity in the information request
lead to more time taken to complete the query formulation.
Supported
p=0.0001 (0.0001)
H2c Higher levels of lexical ambiguity in the information request
leads to lower end user confidence in the accuracy of the
query formulation.
Supported
p=0.0664 (0.0332)
H3a Higher levels of syntactical ambiguity in the information
request lead to more total errors in the query formulation.
Not Supported
p=0.6789 (0.3395)
H3b Higher levels of syntactical ambiguity in the information
request lead to more time taken to complete the query
formulation.
Supported
p=0.0046 (0.0023)
H3c Higher levels of syntactical ambiguity in the information
request leads to lower end user confidence in the accuracy of
the query formulation.
Not Supported
p=0.9292 (0.4646)
H4a Higher levels of inflective ambiguity in the information
request lead to more total errors in the query formulation.
Not Supported
p=0.8963 (0.4482)
H4b Higher levels of inflective ambiguity in the information
request lead to more time taken to complete the query
formulation.
Not Supported
p=0.0013 (0.0007)
(negative parameter)
33
Hypothesis Statement Result
H4c Higher levels of inflective ambiguity in the information
request leads to lower end user confidence in the accuracy of
the query formulation.
Not Supported
P = 0.4844 (0.2422)
H5a Higher levels of pragmatic ambiguity in the information
request lead to more total errors in the query formulation.
Supported
p=0.0042 (0.0021)
H5b Higher levels of pragmatic ambiguity in the information
request lead to more time taken to complete the query
formulation.
Not Supported
p=0.8023 (0.4012)
H5c Higher levels of pragmatic ambiguity in the information
request leads to lower end user confidence in the accuracy of
the query formulation.
Not Supported
p=0.2973 (0.1487)
H6a Higher levels of extraneous ambiguity in the information
request lead to more total errors in the query formulation.
Supported
p=0.0197 (0.0099)
H6b Higher levels of extraneous ambiguity in the information
request lead to more time taken to complete the query
formulation.
Supported
p=0.0003 (0.0002)
H6c Higher levels of extraneous ambiguity in the information
request leads to lower end user confidence in the accuracy of
the query formulation.
Not Supported
p=0.8435 (0.4218)
H7a Higher levels of emphatic ambiguity in the information
request lead to more total errors in the query formulation.
Supported
p=0.0038 (0.0019)
H7b Higher levels of emphatic ambiguity in the information
request lead to more time taken to complete the query
formulation.
Not Supported
p=0.1863 (0.0932)
(negative parameter)
H7c Higher levels of emphatic ambiguity in the information
request leads to lower end user confidence in the accuracy of
the query formulation.
Not Supported
p=0.8697 (0.4349)
H8a Higher levels of suggestive ambiguity in the information
request lead to more total errors in the query formulation.
Supported
p=0.0584 (0.0292)
H8b Higher levels of suggestive ambiguity in the information
request lead to more time taken to complete the query
formulation.
Not Supported
p=0.0134 (0.0067)
(negative parameter)
H8c Higher levels of suggestive ambiguity in the information
request leads to lower end user confidence in the accuracy of
the query formulation.
Not Supported
p=0.3915 (0.1958)
H9a Higher complexity in the information request leads to more
total errors in the query formulation.
Supported
p=0.0001 (0.0001)
H9b Higher complexity in the information request leads to more
time taken to complete the query formulation.
Supported
p=0.0001 (0.0001)
H9c Higher complexity in the information request leads to lower
end user confidence in the accuracy of the query formulation.
Supported
p=0.0001 (0.0001)
34
4.5.1 Potential Ambiguity
The generally weak measured effects for the potential ambiguities assessed by the experiment
(lexical and syntactical) do not support the hypotheses presented in this paper. As the
theoretical model indicates, potential ambiguities derive their ambiguity independently of the
context of the statement. A statement may contain lexical or syntactical ambiguity, but the
context of the statement resolves the ambiguity measured. The hypothesised effects were not
measurable due to the clarification of the ambiguity by the context.
Lexical ambiguity did not show a statistically significant relationship with total errors (H2a).
Lexical ambiguity did demonstrate a statistically significant relationship with duration (H2b,
p=0.0001) and confidence (H2c, p=0.0332 for a one-tailed t-test). The implication of these
results is that lexical ambiguity requires more cognitive effort by the end users to determine
the meaning of the request. Once the meaning of the request has been determined, however,
users do not make significantly more errors in their query formulations. Lexical ambiguity
did result in end users being slightly less confident in their queries.
Although in the hypothesised direction (positive), the relationship between syntactical
ambiguity and total errors (H3a) is not significant (p=0.6789). Syntactical ambiguity does
show a significant relationship with the time taken to compare the query, which indicates that
greater cognitive effort is required to resolve the contextual ambiguity. Syntactical
ambiguity's relationship with end user confidence is not significant (H3c, p=0.9292).
Inflective ambiguity does not show a significant relationship in the hypothesised direction for
H4a (p=0.8963), H4b (negative parameter, p=0.0013), or H4c (p=0.4844). Interestingly,
35
inflective ambiguity shows a significant negative relationship with duration, which is in the
opposite direction to that hypothesised. This result must be considered with caution,
however, as the level of inflective ambiguity present in the questions presented to subjects
was low (Appendix J).
4.5.2 Actual Ambiguity
The role of the actual ambiguity types (pragmatic and extraneous) in the theoretical model are
strongly supported by the empirical results. Actual ambiguities are not clarified by the
context of the statement, i.e., the context does not resolve pragmatic and extraneous
ambiguities. Actual ambiguities generally show a strong relationship with total errors, and
extraneous ambiguity (but not pragmatic ambiguity) displays a strong relationship with
duration. Neither pragmatic or extraneous ambiguities show a significant relationship with
end user confidence.
Pragmatic ambiguities are not clarified by context, and arise where information necessary to
properly answer the information request is missing. The hypothesised relationship between
pragmatic ambiguity and total errors is strongly supported (H5a, p=0.0042). The
hypothesised effects of pragmatic ambiguity on duration (H5b, negative parameter,
p=0.8023), and end user confidence (H5c, p=0.2973) were not significant. Pragmatic
ambiguity may require the end user to infer the missing information, and increase total errors.
In the current experiment, the need to infer missing information did not significantly affect
the time necessary to complete the query response or end user confidence in their query.
Extraneous ambiguity occurs when more information than is required is provided or when the
information request is indirectly and pretentiously written. Extraneous ambiguity misleads
36
end users as to the required response. H6a was strongly supported for total errors (p=0.0197)
and duration (p=0.0003) in the end user query formulation. Extraneous ambiguity, where
more information is provided than is required, appears to require more time and cognitive
effort to resolve the ambiguity, and the query response is more likely to be inaccurate.
The parameter estimates (Table 9) for total errors (3.3940) and for duration (6.7520) indicate
that extraneous information produces severe negative impacts on end user query efficiency
and effectiveness. The result for H6c, which hypothesised that extraneous ambiguity
decreases end user confidence, is not significant (p=0.8435). Where information needs to be
inferred (pragmatic ambiguity), end users appear to recognise and grapple with the
ambiguity. End users appeared less able to recognise and adjust for extraneous ambiguity
than pragmatic ambiguity.
4.5.3 Imaginary Ambiguity
The results for imaginary ambiguities support the hypothesised relationships between these
ambiguities and query errors. The results do not support the hypothesised relationships with
duration or end user confidence. Imaginary ambiguities result in more total errors, but appear
to result in less time taken to complete the requests. These outcomes are important, because,
although not hypothesised, imaginary ambiguities appear to lead end users to infer the
requirements of the question more quickly (leading to a shorter duration required) and to
formulate the query response on that basis (leading to higher total errors). This result should
be treated with caution, as the imaginary ambiguities were not at a high level in this
experiment (Appendix J).
37
Emphatic ambiguity arises from the limited ability to convey intonation in written form. The
hypothesis regarding the effect of emphatic ambiguity on total errors (H7a) is strongly
supported (p=0.0038). Neither H7b (duration) nor H7c (confidence) were statistically
significant. Where the emphasis of the information request cannot be clearly expressed, end
users are required to supply their own emphasis when interpreting the meaning of the
information request. While they appear to make their interpretation quickly, the end users did
not recognise that their queries were more likely to contain errors.
The hypothesised relationship between suggestive ambiguity and total errors (H8a) is
strongly supported (p=0.00292 for a one-tailed t-test). The relationship between suggestive
ambiguity and duration (H8b), however, is opposite to the hypothesised direction, and
significant (negative parameter, p=0.0134). The hypothesised relationship with end user
confidence (H8c) is not supported (p=0.3915). Similar to extraneous ambiguity, suggestive
ambiguity indicates that end users are not able to recognise the negative impact of suggestive
ambiguity on their query formulations. This anomalous result requires further research to
determine the reason for this undesirable result and to search for ways to ameliorate these
problems for end user formulations.
4.5.4 Complexity
The results indicate strong support for the hypotheses regarding complexity (H9a, H9b, and
H9c all with p=0.0001). Task complexity increases total errors and duration, and decreases
the end user's overall confidence in the query formulation. These results are consistent with
previous research (e.g., Borthick et al. 1997; Borthick et al. 2000).
38
5. Implications For Business Practice
This research has developed an initial theory of ambiguity and end user queries. It
empirically investigated seven ambiguities, and measured how they differentially affect end
user query performance. Some ambiguities, e.g., lexical, extraneous, pragmatic, and
emphatic, affect end user query performance more than others. Some ambiguities, i.e.
extraneous, and suggestive, indicate that end users will potentially make decisions based on
results that are inaccurate or misleading.
5.1.1 Electronic Mail
In the business world, electronic mail is often used to transmit information requests,
frequently without the benefit of other channels of communication (Star 1995). Furthermore,
these information requests are hurriedly written (Star 1995; Fowler and Aaron 1998). Such
haste contributes to syntactical, lexical, and inflective ambiguities. The use of shorthand
notations often miscommunicates the intended message. Electronic mails frequently leave
assumptions about the business process unstated and assumed. These omissions contribute to
pragmatic ambiguity. The hurried state of the specification, and the lack of a formal
specification process also contribute to extraneous ambiguity (Fowler and Aaron 1998).
Lexical, syntactical, inflective, and, to some extent, extraneous, ambiguity types are functions
of the grammar used to write the information request. The longer the request, the more likely
the request is to contain these ambiguities (Fowler and Aaron 1998). Concise writing is
important to reduce ambiguity. Good written communication skills on the part of the
individual making the information request are required.
39
All seven ambiguities arise in the daily business specification of reports. Several strategies
are available to reduce their impact. Electronic mails containing information requests need to
be concisely drafted and proofread to reduce pragmatic ambiguity. Providing concise
specifications and avoiding indirect writing, e.g., pretentious writing and passive voice,
reduce the lexical, syntactical, inflective, and extraneous ambiguity of information requests
(Fowler and Aaron 1998).
Emoticons (Sanderson 1993) and generally accepted formatting styles can be used to add
emphasis to electronic mail. These techniques can reduce emphatic ambiguity.
An objective reading of the information request to reduce innuendo addresses suggestive
ambiguity. Explaining the reason for the information request as much as possible will
enhance clarity and reduce the perception of hidden agendas.
Each of the above techniques enhance the clarity of the information request and thus increase
the effectiveness and efficiency of the response received. These techniques initially increase
the time necessary to write the information request. Nonetheless, this paper's results indicate
that the result will be an increase in the timeliness, accuracy, and relevance of the information
received.
5.1.2 Personnel Turnover and Work Teams
Information systems personnel and end users are frequently engaged on short-term contracts.
Turnover in many organisations, and especially within work groups, is high (Moore 2000).
As turnover increases, the ambiguity of information requests also tends to increase. End
users have less experience and understanding of the organisational culture and thus do not
40
understand the context and assumptions made in information requests. Especially when
faced with high turnover of information systems personnel and end users, strategies for
reducing the seven ambiguities can significantly benefit the organisation.
Jessup and Valacich (1993) suggest strategies for retaining group memory and enhancing
organisational learning. For work teams that often have new members, a library of previous
information requests and associated query responses will assist team members to reduce
information request ambiguity by providing a context for the request. To function properly,
new team members must understand the organisational procedures and have a context within
which to function.
Business would benefit from candidly assessing its methodology of making information
requests. Using methodologies that result in less ambiguity through formalisation of the
information request will reduce errors and improve the efficient use of the time of skilled end
users.
41
6. Contributions, Limitations, and Future Research
6.1 Research Contributions
This paper provided significant, unique contributions to the theory of ambiguity, complexity,
and end user query performance. The theory of communication linguistics has been applied
to end user query performance theory. The theory identified seven ambiguities: lexical,
syntactical, inflective, pragmatic, extraneous, emphatic and suggestive. The empirical results
obtained for the developed theory are robust, and indicate substantial support.
An instrument to measure ambiguity in an information request, at a finer level than
previously available, was developed and applied. Although requiring further refinement, this
instrument is a significant advance in the measurement of information request ambiguity.
This paper identifies areas for future research, and examines the implications for business
practices. This paper represents a significant advancement of the theory and application to
ensure the efficient and effective development of queries by end users.
6.2 Research Limitations
Huck et al. (1974) identify seven issues for the internal validity of experiments. Appendix L
provides a detailed analysis of these issues. Appendix L outlines how this experiment's
design controlled for each issue.
As with most controlled laboratory experiments with student participants for subjects, there
are external validity issues. Generalisation from student subjects to the business setting may
42
be invalid. Students' motivations to obtain a high grade may be different to the business end
user. This experiment's use of advanced business and systems undergraduate students as
subjects however implies that this generalisation to the business setting is meaningful, as
these subjects are reflective of the skill levels of end users in a business context.
Generalising from this paper's results to a business setting is invalid to the extent that the
experimental information requests are not representative of information requests made in a
business setting. The information requests nonetheless are based on a close model of the
business world, undertaking likely real world tasks.
Another limitation is the need to extend the results to more extreme levels of ambiguity. The
ambiguity present in the experiment's questions was not extreme. Hence, generalising from
the results of the current experiment to more extreme levels of ambiguity may not be valid.
6.3 Future Research
Replication of this experiment, with more ambiguous information requests than those of the
current experiment, would strengthen the theoretical model. An experiment designed to
examine contextual reduction of the potential ambiguities (lexical, syntactical, and inflective)
would also be valuable. The weaker results of the current experiment may derive from a lack
of variation in ambiguity for some of the seven types of ambiguity. Instantiating ambiguity
into the experiment over a greater range and variation of ambiguity in the information
requests would add empirical insight into the theoretical model.
This paper presents what initially appear to be anomalous results for inflective and suggestive
ambiguity in the context of duration. A future experiment would do well to investigate the
43
circumstances of these results, and to empirically analyse the relationship between inflective
ambiguity, suggestive ambiguity, and duration.
A future experiment having particular regard to end user confidence would significantly
assist the development of the theoretical model. None of the hypotheses, with the exception
of lexical ambiguity (H2c), is supported for end user confidence. On the basis of the current
results, end user confidence often does not reflect the true state of affairs of the query
response's accuracy. End users do not appear to know when the query response is inaccurate.
Outside of the domain of laboratory research, an avenue for future research would be a field
experiment of ambiguity and the performance of business end users. This experiment would
allow the researcher to examine the prevalence and effects of the seven types of ambiguity in
actual business settings. Such a study would also make a contribution by assessing the extent
to which the current experimental results generalise to the business setting.
An experiment designed to analyse the empirical effectiveness of strategies to mitigate each
ambiguity in a business setting would hold considerable value for research and business
practice. This would allow the development and subsequent assessment of strategies to
reduce the effect of ambiguity on end user query performance.
The development and empirical testing of the ambiguity assessment instrument (Appendix K)
would provide the opportunity to refine and enhance the current initial instrument. Future
research is necessary to develop a reliable and robust instrument for the measurement of
ambiguity in information requests.
44
References
Almuallim, H., Akiba, Y., Yamazaki, T., and Kaneda, S. "Learning Verb Translation Rules
from Ambiguous Examples and a Large Semantic Hierarchy," Computational Learning Theory and Natural Learning Systems, (4), 1997, pp. 323-336.
Athey, S., and Wickham, M. "Required Skills for Information Systems Jobs in Australia".
Journal of Computer Information Systems,.(36:2), 1995-1996.
Australian Bureau of Statistics. "8669.0 Computing Services Industry, Australia, 1995-96".
Australian Bureau of Statistics. 1997.
Axley, S.R. "Managerial and organizational communication in terms of the conduit
metaphor," Academy of Management Review, (9), 1984, pp. 428-437.
Borthick, A.F., Bowen, P.L., and Diery, R.G. "Complexity and Errors in SQL Queries:
Development and Empirical Comparison of Complexity Measures." Workshop on
Information Technologies and Systems (WITS '97), pp. 31-40, December 13-14 1997.
Borthick, A.F., Bowen, P.L., Jones, D.R., and Tse, M.H.K. "The Effects of Information
Request Ambiguity and Construct Incongruence on Query Development," Proceedings of the Pacific Asia Conference on Information Systems, June 2000.
Campbell, D. J. "Task Complexity: A Review and Analysis," Academy of Management
Review, (13:1), 1988, pp. 40-52.
Cardinali, R. "Information Systems - A Key Ingredient to Achieving Organizational
Competitive Strategy," Computer in Industry, (18:3), 1992, pp. 241-245.
Chomsky, N. "Language and Mind," in Ways of Communicating, Cambridge University
Press, Cambridge, 1991, pp. 56-80.
Conger, S. The New Software Engineering, Wadsworth Publishing, Belmont, California.
1994.
Copi, I. M., and Cohen, C. Introduction to Logic (8th ed.), Macmillan, New York, New York,
1990.
Cronbach, L. J. "Coefficient Alpha and the Internal Structure of Tests," Psychometrika, (16),
1951, pp. 297-334.
Delligatta, A., and Umbaugh, R. E. "EUC Becomes Enterprise Computing," Information
Systems Management, Fall 1993, pp. 53-55.
Dubin, R. Theory Building, Collier Macmillan Publishers, London, 1978.
Eisenberg, E.M., and Phillips, S.R. "Miscommunication in Organizations," in
"Miscommunication" and Problematic Talk, Sage Publications, London, 1991.
Fischer, D. H. Historians' Fallacies, Harper & Row, New York, 1970.
Fowler, H. R., and Aaron, J. E. The Little, Brown Handbook (7th ed.), Addison-Wesley
Publishers Inc., New York, New York, 1998.
Freeman, L.A., Jarvenpaa, S.L., and Wheeler, B. C. "The Supply and Demand of
Information Systems Doctorates: Past, Present and Future," MIS Quarterly, (24:2), June 2000.
45
Halstead, M. H. Elements of Software Science, Elsevier North-Holland Inc, Purdue University, 1977.
Hamblin, C. L. Fallacies, Methuen, London, 1970.
Huck, S. W., Cormier, W. H., and Bounds, W. G. Jr. Reading Statistics and Research,
Harper & Row, New York, New York, 1974.
Jespersen, O. Language: its nature, development and origin, Allen & Unwin, London, 1922.
Jessup, L.M., and Valacich, J.S. Group Support Systems, Macmillan Publishing Company,
New York, New York, 1993.
Jih, W.J.K., Bradbard, D.A., Snyder, C.A., and Thompson, N.G.A. "The Effects of
Relational and Entity-Relationship Data Models on Query Performance of End Users,"
International Journal of Man-Machine Studies, (31), 1989, pp. 257-267.
Katzeff, C. "Systems Demands on Mental Models for a Fulltext Database," International
Journal of Man-Machine Studies, (32), 1990, pp. 483-509.
Keen, P.G.W. "Information Technology and the Management Difference: A Fusion Map,"
IBM Systems Journal, (32:1), 1993, pp. 17-38.
Kooij, J.G. Ambiguity in Natural Language, North-Holland Publishing Company,
Amsterdam, Holland, 1971.
Liew, S.T. "The Effects of Normalization on Query Errors: An Experimental Evaluation,"
Unpublished Thesis, University of Queensland, 1995.
Moore, J.E. "One Road to Turnover: An Examination of Work Exhaustion in Technology
Professionals," MIS Quarterly, (24:1), March 2000, pp. 141-168.
Nath, R., and Lederer, A.L. "Team Building for IS Success," Information Systems
Management, Spring 1996, pp. 32-37.
Newbold, P. Statistics for Business and Economics, Prentice-Hall Inc, Englewood Cliffs,
New Jersey, 1984.
Ogden, W.C., Korenstein, R., and Smelcer, J.B. An Intelligent Front-End for SQL, IBM
General Products Division, San Jose, California, 1986.
Reilly, R.G. "Miscommunication at the Person-Machine Interface," in "Miscommunication"
and Problematic Talk, Sage Publications, London, 1991.
Reisner, P. "Use of Psychological Experimentation as an Aid to Development of a Query
Language," IEEE Transactions on Software Engineering, SE3:3, 1977, pp. 218-299.
Rescher, N. Introduction to Logic, St Martin's Press, New York, New York, 1964.
Rho, S., and March, S.T. "An Analysis of Semantic Overload in Database Access Systems
using Multi-Table Query Formulation," Journal of Database Management, (8:2), Spring
1997, pp. 3-14.
Rosenthal, D.A., and Jategaonkar, V.A. "Wanted: Qualified IS Professionals," Information
Systems Management, Spring 1995, pp. 27-31.
Russell, B.A.W. "Vagueness," Australasian Journal of Philosophy and Psychology, (1),
1923, pp. 84-92.
Ryan, H.W. "User-Driven Systems Development: Defining a New Role for IS," Information
Systems Management, Summer 1993, pp. 66-68.
46
Ryle, G. Collected Papers, (2), Hutchinson, London, 1971.
Sanderson, D. Smileys, O'Reilly, Sebastapol, California, 1993.
Sekine, S., Carroll, J.J., Ananiadou, S., and Tsujii, J. "Automatic learning for Semantic
Collocation," Third Conference on Applied Natural Language Processing, 1992, pp. 104-
100.
Severin, W.J., and Tankard, J.W. "Communication Theories: Origins, Methods, and Uses in
the Mass Media," Addison Wesley Longman, Inc., New York, New York, 1997.
Star, S.L. The Cultures of Computing, Blackwell Publishers/The Sociological Review,
Oxford, U.K., 1995.
Suh, K.S., and Jenkins, A.M. "A Comparison of Linear Keyword and Restricted Natural
Language Database Interfaces for Novice Users," Information Systems Research, (3:3), 1992, pp. 252-272.
Tayntor, C.B. "New Challenges or the End of EUC?," Information Systems Management,
Summer 1994, pp. 86-88.
Trow, C.E. The Old Shipmasters of Salem, New York, New York, 1905.
Turner, G.W. (Editor). The Australian Concise Oxford Dictionary of Current English,
Oxford University Press, Melbourne, 1987.
Walton, D. Fallacies Arising from Ambiguity, Kluwer Academic Publishers, Dordrecht,
1996.
Williamson, T. Vagueness, Routledge, New York, New York, 1994.
Wood, R.E. "Task Complexity: Definition of the Construct," Organizational Behaviour and
Human Decision Processes, (37), 1986, pp. 60-82.
47
Appendix A: Experiment Information Requests and Model Answers
No. Formulation Information Request
1. Ambiguous Management wants a list of each of our suppliers with no
duplicates in the list.
Clear List the distinct suppliers of the items we stock.
Model Answer (Halstead’s Complexity: 1.6927):
Select distinct(item_maker) from inventory;
2. Ambiguous Produce a report that lists the inventory items where the quantity
on hand is much larger, on a percentage basis, than the quantity ordered.
Clear List item number, item name, quantity on hand, quantity on order
where quantity on hand is greater than 2 * quantity ordered.
Model Answer (Halstead’s Complexity: 5.4186):
Select item_no, item_name, qty_hand, qty_ordered from inventory where qty_hand > 2 *
qty_ordered;
3. Ambiguous Management wants a list of all Japanese customers and customers
with credit limits over $15,000.
Clear List customer numbers, customer names, country, and credit limit
of customers with credit limits greater than $15,000 or of
customers in Japan.
Model Answer (Halstead’s Complexity: 6.8908):
Select cust_no, cust_name, country, credit_limit from customer where country = 'Japan' or
credit_limit > 15000;
4. Ambiguous Produce a report that statistically compares the credit limits for
customers in different countries.
Clear List country, average credit limit, and standard deviation of
customer credit limit grouped by country.
Model Answer (Halstead’s Complexity: 4.4697):
Select country, avg(credit_limit), stddev(credit_limit) from customer group by country;
5. Ambiguous Produce a report of clients that prefer the Speedair carrier and
addresses.
Clear List customer number, customer name, street, city, post code, and
country where the customer's preferred carrier is Speedair.
Model Answer (Halstead’s Complexity: 12.2917):
Select cust_no, cust_name, street, city, state, post_code, country From customer, carrier
where customer.pref_carrier_code = carrier.carrier_code and carrier_name = ‘Speedair’;
48
No. Formulation Information Request
6. Ambiguous We're wondering if some of our winemakers are using poor quality
packaging and bottles - we've had a few complaints. Can you get
us a report that gives us some sort of idea about what items we are
shipping compared to what the customers are taking delivery of?
It would probably be a good idea while you're at it to give a
comparative percentage of the stuff shipped that doesn't make it -
just so the vintners won't try and weasel their way out of it, you
understand, they're good at that.
Clear List item maker, item number, item name, and 100 * (sum of
quantity shipped less sum of quantity accepted) / (sum of quantity shipped) where the type of alcohol is wine.
Model Answer (Halstead’s Complexity: 18.8):
Select item_maker, inventory.item_no, item_name, 100 * (sum(qty_shipped - qty_accepted) /
sum(qty_shipped)) From inventory, invoiceitem where inventory.item_no =
invoiceitem.item_no and type_of_alc = "wine" Group by item_maker, inventory.item_no,
item_name;
7. Ambiguous Prepare a report that provides *all* customer's details and
indicates the number of different products they have ordered from
us.
Clear List customer number, and customer name for *all* customers,
and, if they have ordered anything, a count of unique items ordered.
Model Answer (Halstead’s Complexity: 16.0076):
Select customer.cust_no, cust_name, count(distinct(item_no)) from customer, invoice,
invoiceitem where customer.cust_no = invoice.cust_no (+) and invoice.invoice_no = invoiceitem.invoice_no (+) group by customer.cust_no, cust_name;
8. Ambiguous Management wants to know which customers we've shipped goods
more than 10 times to them by the shipper that they requested.
Clear List customer number, name, and count of invoices, where the
actual carrier is the same as the customer's preferred carrier,
having more than 10 shipments.
Model Answer (Halstead’s Complexity: 16.2684):
Select customer.cust_no, cust_name, count(*) from Invoice, Customer where
invoice.cust_no = customer.cust_no and invoice.carrier_code = customer.pref_carrier_code group by customer.cust_no, cust_name having count(*) > 10;
49
No. Formulation Information Request
9. Ambiguous Produce a report, with best items first, on the gross contribution to
profitability of each inventory item for July 1999.
Clear List item number, item description, and (unit price less unit cost)
multiplied by units sold in July 1999. Sort your output by descending gross contribution to profitability.
Model Answer (Halstead’s Complexity: 23.897):
select inventory.item_no, item_name, avg(avg_unit_price - avg_unit_cost) *
sum(qty_accepted) from invoice, invoiceitem, inventory where invoice.invoice_no =
invoiceitem.invoice_no and invoiceitem.item_no = inventory.item_no and deliver_date
between '1-Jul-99' and '31-Jul-99' group by inventory.item_no, item_name order by 3 desc;
10. Ambiguous Produce a report with the relevant customer details that gives us an
idea of how much of our business is exposed to foreign currency
fluctuations.
Clear List customer number, customer name, customer country, and a
total of the amount paid where the settlement currency code for the
invoice is not equal to the currency code for Australian dollars.
Group results by customer number.
Model Answer (Halstead’s Complexity: 19.4819):
Select customer.cust_no, cust_name, country, sum(amt_paid) from customer, invoice,
currency where customer.cust_no = invoice.cust_no and invoice.currency_code =
currency.currency_code and currency.currency_name <> ‘Australian Dollar’ Group by customer.cust_no, cust_name, country;
11. Ambiguous Management is concerned about current slow-moving inventory
items, based on shipments since 1 June 1999. Produce a report of
the items that they might be most concerned about.
Clear List inventory item number, item description, quantity on hand,
and sum(quantity shipped) with ship dates greater than 1 June
1999 that have sums of the quantity shipped less than the sums of
the quantity on hand.
Model Answer (Halstead’s Complexity: 22.4):
Select inventory.item_no, item_name, sum(qty_hand), sum(qty_shipped) from inventory,
invoiceitem, invoice where inventory.item_no = invoiceitem.item_no and
invoiceitem.invoice_no = invoice.invoice_no and ship_date > ‘1-Jun-99’ group by
inventory.item_no, item_name having sum(qty_shipped) < sum(qty_hand);
50
No. Formulation Information Request
12. Ambiguous Produce a report that gives some idea about our best USA export
items where the amount since March is bigger than $5,000.
Clear List item numbers, item descriptions and the total accepted
quantity times agreed price of each item for items shipped to US
customers since 1 March 1999 and having a total accepted quantity
times agreed price greater than $5,000.
Model Answer (Halstead’s Complexity: 29.1633):
select inventory.item_no, item_name, sum(qty_accepted * agreed_unit_price) from invoice,
invoiceitem, inventory, customer where invoice.invoice_no = invoiceitem.invoice_no and
invoiceitem.item_no = inventory.item_no and customer.cust_no = invoice.cust_no and
ship_date > '1-Mar-99' and country = ‘USA’ group by inventory.item_no, item_name
having sum(qty_accepted * agreed_unit_price) > 5000;
13. Ambiguous Produce a report showing our Japanese client base that didn't order
anything in July. We're going to need an idea of how many
invoices and things like that that we have for them. We're
concerned about why our orders have dropped off. Can you use
that statistical thing (you know, the one that gives an idea of how
the numbers are varying, not variance, the other one) to show
whether the date the stuff is delivered is different to the date they wanted the stuff?
Clear List customer number, customer name, number of invoices, and
standard deviation of the difference between the deliver date and
the want date for Japanese customers who did not place an order in July 1999.
Model Answer (Halstead’s Complexity: 24.0168):
select customer.cust_no, cust_name, count(invoice_no), stddev(deliver_date - want_date)
from customer, invoice where customer.cust_no = invoice.cust_no and country = 'Japan'
and customer.cust_no not in (select cust_no from invoice where order_date between '1-Jul-
99' and '31-Jul-99') group by customer.cust_no, cust_name;
14. Ambiguous We want to have a mail-out to our best customers (say, those who
paid us more than $5000 or so recently, and those with credit
limits over $20,000). We're interested in seeing if we can move
that new Hunter Valley shipment. Can you get us a mailing list?
Clear List customer number, name, street, city, state, post code, and
country for those customers with credit limits greater than $20,000 or since 1 July 1999 have total paid invoices of more than $5,000.
Model Answer (Halstead’s Complexity: 29.9607):
select customer.cust_no, cust_name, street, city, state, post_code, country from customer,
invoice where customer.cust_no = invoice.cust_no group by customer.cust_no, cust_name,
street, city, state, post_code, country having sum(amt_paid) > 5000
UNION
select customer.cust_no, cust_name, street, city, state, post_code, country from customer where credit_limit > 20000;
51
No. Formulation Information Request
15. Ambiguous Produce a report that shows the percentage of orders where we're
not meeting customers' delivery date expectations in each country.
Clear Count all invoices, where the date the order was delivered was
larger than the date the customer wanted the order. Group by country. Calculate the percentage of late orders by country.
Model Answer (Halstead’s Complexity: 34.992):
Create View TotalOrders as select country, count(*) Total_Orders from customer, invoice
here customer.cust_no = invoice.cust_no group by country;
Create view LateOrders as select country, count(*) Late_Orders from customer, invoice
where customer.cust_no = invoice.cust_no and deliver_date > want_date group by country;
Select total_orders.country, 100*(late_orders / total_orders) Percent_Late_Orders from
lateorders, totalorders where totalorders.country = lateorders.country;
16. Ambiguous Produce a report that shows, by country, which carriers are, on
average, not meeting their expected delivery times.
Clear List carrier code, carrier name, country, and average of (delivery
days less the difference between delivery date and ship date) by
country having that average difference greater than 1 day.
Model Answer (Halstead’s Complexity: 40.1661):
select carrier.carrier_code, carrier_name, delivdays.country avg((deliver_date - ship_date)
- deliver_days) from carrier, invoice, customer, delivdays where carrier.carrier_code =
invoice.carrier_code and invoice.cust_no = customer.cust_no and carrier.carrier_code =
delivdays.carrier_code and customer.city = delivdays.city and customer.state =
delivdays.state and customer.country = delivdays.country group by carrier.carrier_code, carrier_name, delivdays.country having avg((deliver_date - ship_date) - deliver_days) > 1;
52
Appendix B: Experiment Instruction Sheet
INSTRUCTIONS
This laboratory session requires you to execute command files and query a database.
Please follow the instructions carefully.
53
Part 1 - Scenario
George Harford Wine Merchant distributes wines throughout the world. They predominantly
trade with customers in France, Japan, the USA, and the UK. Customers place orders for
wines which employees process, pack, and ship to the customers via an appropriate carrier.
The packers attach an invoice created by the Accounts Receivable department to the goods
when shipped. These invoices contain all relevant information generated from the invoice and
inventory databases. The data structures for the relevant tables are attached.
54
Part 2 - SQL Syntax Reminder
The SQL syntax for SELECT commands follows. Items in square brackets [ ] are optional,
and items in braces { } can be repeated zero or more times:
SELECT [DISTINCT]*|(((table. | view.)column | expression) [alias]
{, ((table. | view.)column | expression)[alias]})
FROM (table|view)[alias]{,(table | view)[alias]}
WHERE condition {, condition}
[GROUP BY expression{,expression} [HAVING condition{,condition}]]
[(UNION|UNIONALL|INTERSECT|MINUS) SELECT command]
[ORDER BY (expression|position)[DESC]{,(expression|position)
[DESC]}];
Only under highly unusual circumstances should you formulate a select command that
contains more than one table in the FROM clause without a join in the WHERE cause. As a
general rule, the number of joins should equal to the number of foreign key attributes. Except
for extremely rare queries that usually produce only summary results (such as counting the
number of records in a table), all SQL queries, even those involving only one table, should
include WHERE conditions.
You may need to use some of the following keywords
AND
AVG
COUNT
DISTINCT
IN
MAX
MIN
NOT
NULL
OR
STDDEV
SUM
SYSDATE
UNIQUE
VARIANCE
(+) (outer join)
The SQL syntax for VIEW commands follows
CREATE VIEW viewname AS (SELECT command);
When you create a view with the same name as an already-existing view (for example, you
rerun your query), you will need to drop the already-existing view:
DROP VIEW viewname;
Reminders:
Aliases for columns in views should not be enclosed in quotes.
If you have multiple join conditions, i.e., more than one foreign key or a concatenated
foreign key, you may need to put the outer join symbol on other join conditions.
55
Part 3 - Getting started
Log into your area on valinor. For the purposes of assessment, everything you do in this
laboratory session needs to be recorded and sent to the instructor. Follow the instructions
carefully. In particular, please refrain from running more than one session on valinor because
running more than one session will mean that all your query attempts will not be recorded. To
begin this quiz, type the following at the valinor prompt:
valinor> ksh
valinor> /home/staff/bowen/startqz199b
Follow the instructions given by the program carefully. You can attempt each query as many
times as you wish.
You should note that once you accept a query, you cannot return to the question again.
56
Part 3 - Getting started
Log into your area on valinor. For the purposes of assessment, everything you do in this
laboratory session needs to be recorded and sent to the instructor. Follow the instructions
carefully. In particular, please refrain from running more than one session on valinor because
running more than one session will mean that all your query attempts will not be recorded. To
begin this quiz, type the following at the valinor prompt:
valinor> ksh
valinor> /home/staff/bowen/startqz199a
Follow the instructions given by the program carefully. You can attempt each query as many
times as you wish.
You should note that once you accept a query, you cannot return to the question again.
57
Part 4 - Your Mission
You are an internal auditor at George Harford. On 16 August 1999, your supervisor
approaches you with a list of questions. Some questions were designed by the supervisor,
who knows SQL well. Your supervisor was also given questions from management, who do
not know SQL all that well.
Your task is to formulate and execute SQL queries to answer these questions.
Your supervisor is gone for the day and getting answers for these questions is urgent.
Therefore, you need to make your best interpretation of the questions from management. You
can discuss with your supervisor the assumptions you made after she returns. However, she
will be most annoyed if you do not make an attempt to answer as many of the questions as
you can prior to her return.
The questions have been structured so that easier questions appear first and then become
progressively more difficult.
Your supervisor wants to see the complete SQL queries that you use. When the question is
phrased asking for a name, your query should use criteria that include that name i.e. you
should not look up the code to avoid joining to the table that contains the name.
58
Appendix C: Command Interpreter Unix Shell Script
Two Unix Shell Scripts were used to operate the experiment. The two scripts were essentially
identical except that they used different source data depending on the treatment initially
received by the different experimental groups (the variable $quizfile). This script has been
developed, modified, and enhanced from previous experiments undertaken within the Faculty
of Commerce at the University of Queensland (Borthick et al. 1997; Borthick et al. 2000).
The interface source code had been previously developed by Mr Andrew Jones.
59
#!/bin/ksh
## /\ndy. 28/08/98. version 0.02
## NB. this script requires ksh because it uses "read -u".
## The rest of it should run in any sh-compatible shell (sh, bash, ksh etc)
## DoLog() - A utility function to append a message to our log file.
## As it stands, each line contains the username, process ID, date, time,
## and a message
## eg.
## [jones] <4268> 28/08 11:41:09: Displaying question 3
## [jones] <4298> 28/08 11:41:12: Attempting question 3 Attempt number 1
DoLog()
{
## %a = day, %e = date, %m = month. %T = time.
now=`date +"[$username] <$$> %e/%m %T:"`
echo "$now $*" >> $logfile
}
## Obtain the username of the person running this program, for the log
file.
## No need to change this.
###username=${USER:-$LOGNAME}
username=`whoami`
## CONFIGURE THIS:
## "quizfile" is a variable which contains the name of the file with the
## questions you wish to present to the students. You should edit this
## script to set this variable to the appropriate value.
## If this variable is null, then the program will expect a single
## command-line argument, which will be the filename of the question file.
##
## The question file should contains questions, one per line.
##
## Note that the user running this program must have access privs to the
## question file and the directories above it...
## eg. quizfile="/home/staff/bowen/questions"
quizfile="/home/staff/bowen/questions99qz1b"
## CONFIGURE THIS:
## Location of the log file to record what people do.
## You can reset this to whatever you like, but make sure that everyone
## can append to it. Also note that files in /tmp disappear when
## valinor is restarted. /var/tmp might be safer, but who knows.
##
## Probably best if you make a logfile directory in your home dir,
## chmod it to mode 1777 and put the log files in there...
##
## Note: If the log file does not already exist, this program will now
## create it. This better allows per-user log files to work.
## However, if you are using only one log file, it is a better idea
## if you create and chmod it yourself...
60
#logfile="/var/tmp/sql.log" # one log for all users..
#logfile="/var/tmp/sql.$username.log" # one log per user...
logfile="/home/staff/bowen/logfile/qz199/$username.log"
## Editor to use. pico is the easiest.. esp if we run it in "tool" mode...
editor="pico -t"
## temporary filenames.
tmp="/tmp/qn-$username.$$"
attfile="$HOME/answer.$$"
qnum=1 # question number
attnum=0 # attempt number
## Set up a clean up routine to clean up after ourselves in case we die..
trap 'rm -f "$attfile" "$tmp"; exit 1' 1 3 15 8
## "echo -n" is supposed to print without a newline.
## This little hack ensures it will on valinor...
PATH=/usr/ucb:${PATH}
## ---------------------------------------------------------------------
## End of configuration section: Start of program.
## Create the log file if it doesn't exit...
if [ ! -f "$logfile" ]
then
> $logfile
chmod 666 $logfile
DoLog "StartUp: Created this Log file."
fi
if [ -z "$quizfile" ]
then
## No $quizfile, so we expect a question file command-line argument.
if [ $# != 1 ]
then
echo "Usage: `basename $0` file-with-questions"
DoLog "Error: No quizfile and no cmd line argument."
exit 1
fi
quizfile="$1"
fi
## Make sure we can read the file. NB. this requires some permissions on
the
## directory containing the file, and that directory's parent, and ...
if [ ! -f "$quizfile" ]
then
echo "Error: Unable to read file: \"$quizfile\"."
DoLog "Error: Can't open file $question (pwd=`pwd`)"
exit 2
61
fi
## Splash screen telling them what will happen.
DoLog "Startup: Showing splash screen."
clear
cat <<ENDOFBLURB
CO365 DATABASE MANAGEMENT SYSTEMS IN BUSINESS
QUIZ ONE
In this exercise, you will be presented with a series of problems.
The first problem will be displayed, and then the system will wait
for you to hit the <RETURN> (aka the <ENTER>) key.
This gives you time to read and absorb the problem.
After you hit the <ENTER> key, you will be taken into the user-friendly
editor "pico", where you can compose a solution. When you are satisfied,
quit the editor with the Control-X command. Your solution will be run,
and any output will be displayed on your screen.
You will then be asked whether you are happy with your solution.
If you are not, then you can re-edit your first attempt and try again.
Otherwise, you will be asked to rank your confidence in your solution.
You then continue on to the second problem, and so on...
ENDOFBLURB
echo -n "Hit the <RETURN> key to continue."
read junk
echo
echo
clear
DoLog "Startup: Finished showing splash screen."
exec 3<"$quizfile"
qnum=1
## This is the main loop of the program.
while read -u3 question
do
## if we are between questions, make the screen tidier.
if [ "$qnum" -gt 1 ]
then
clear
## echo
echo "Ok. Onto the next question."
echo
fi
thisattmpt="retry"
attnum=0 # attempt number
> $attfile
62
## attempt the current question.
while [ "$thisattmpt" != "accept" ]
do
attnum=`expr $attnum + 1`
clear
echo "Question #$qnum:"
echo
echo "$question"
echo
if [ $attnum = 1 ]
then
echo
echo "--------------------------------------------------"
echo "When you are finished reading the question, hit the
<ENTER> key, to start"
echo -n "using an editor to create your solution. "
DoLog "Displaying question $qnum"
else
echo
echo "--------------------------------------------------"
echo "Your current solution is ..."
sed -e 's/^/| /' < $attfile
echo
echo -n "Hit the <ENTER> key to re-edit this... "
fi
# pause here until they hit RETURN
read junk
DoLog " Attempting question $qnum Attempt number $attnum"
$editor $attfile
## cp $attfile $username.sql
## echo "quit" >> $username.sql
echo
echo "Ok. Now testing this solution..."
echo
## FIXME: Need to make sure that the Oracle environment
## is properly set up so that they can run sqlplus...
## Plus, the /dev/null thing is crude, but probably enough to
## prevent them getting into an interactive oracle session...
sqlplus / @$attfile < /dev/null
## Reformat of output allows users to use data more
## interactively. Micheal Axelsen 1999.
## Disabled since they can then end up in a cartesian
## product join.
## echo "Attempting Question: $qnum" > $username.lst
## echo "" >> $username.lst
## cat "$question" >> output_screen
## echo "" >> $username.lst
## echo "Your SQL Query:" >> $username.lst
## echo >> $username.lst
## cat $attfile >> $username.lst
## echo "" >> $username.lst
63
## echo "Results:" >> $username.lst
## sqlplus / @$username.sql >> $username.lst
## $editor $username.lst
## Should we pipe output into less for them to see?
echo
## Should we capture their attempt?
DoLog " The attempt was ..."
sed -e "s/^/[$username] <$$> Qn: $qnum Att: $attnum /" <
$attfile >> $logfile
## ask if happy with this attempt or not
echo "Are you happy with this attempt, or do you want to try
again?"
PS3="Choice: "
select thisattmpt in retry accept
do
if [ -n "$thisattmpt" ]
then
echo "Ok."
break
fi
echo "Invalid response. Try again."
done
echo
done
DoLog "Completed question $qnum Number of attempts was $attnum"
## DoLog "The final solution was ..."
## sed -e 's/^/| /' < $attfile >> $logfile
## Ask here how confident they are...
echo "How confident are you about your solution?"
PS3="Confidence? "
select conf in "85-100%" "70-85%" "55-70%" "40-55%" "25-40%" "10-25%"
"<10%"
do
if [ -n "$conf" ]
then
echo "Ok."
break
fi
done
DoLog "Confidence for question $qnum was $conf"
echo
echo "Ok. Now what?"
PS3="What now? "
select whatnow in "Contine to next question" "Quit"
do
if [ -n "$whatnow" ]
64
then
break
fi
done
if [ "$whatnow" = "Quit" ]
then
echo
echo "Are you sure you want to quit?"
PS3="Confirm quit: "
select confirm in yes no
do
if [ -n "$confirm" ]
then
break
fi
done
if [ "$confirm" = "yes" ]
then
echo "Ok. Quitting now."
break
else
echo "Ok. Not quitting."
fi
fi
## NB. It's more efficient to use the shell's built in arithmetic...
qnum=`expr $qnum + 1`
done
DoLog "Quitting."
rm -f "$attfile" "$tmp"
echo "Bye..."
65
Appendix D: Experiment Entity-Relationship Diagram
Customer
Cust_no+
Cust_name
Phone_no
Street
City
State
Post_code
Country
Credit_limit
Outstanding_bal
Pref_carrier_code
Delivdays
Carrier_code+
City+
State+
Country+
Deliver_days
Carrier
Carrier_code+
Carrier_name
Carrier_type
Invoice
Invoice_no+
Order_date
Cust_no
Ship_date
Want_date
Deliver_date
Paid_date
Fob_code
Disc_pct
Disc_days
Currency_code
Amt_paid
Carrier_code
Emp_no
Employee
Emp_no+
Emp_name
Currency
Currency_code+
Currency_name
Currency_date+
Currency_rateFob
Fob_code+
Fob_name
Invoiceitem
Invoice_no+
Item_no+
Unit_meas
Quoted_unit_price
Agreed_unit_price
Qty_shipped
Qty_accepted
Diff_cause
Inventory
Item_no+
Item_name
Item_maker
Item_package
Item_year
Type_of_alc
Alc_category
Alc_content
Avg_unit_cost
Unit_meas
Avg_unit_price
Qty_hand
Qty_ordered
FK = Foreign Key
+ Primary Key
FK =
Carrier_code
FK = Carrier_code
FK = Emp_no
FK =
Currency_code
+ [Appropriate
Dates]
FK = Invoice_no
FK = Cust_no
FK = Fob_code
FK = Item_no
66
Abbreviation Type Description
Table: Invoice
Invoice_no Char(7) Invoice number
Order_date Date Date the order was placed
Cust_no Char(5) Customer number
Ship_date Date Date the order was shipped
Want_date Date Date the order was wanted by the customer
Deliver_date Date Date the order was delivered
Paid_date Date Date the invoice was paid
Fob_code Char(1) FOB code {1,2}
Disc_pct Number Discount percent, e.g. 1, 1.5, 2, 2.25
Disc_days Number Discount days - start day depends on FOB
Currency_code Char(1) Settlement currency code
Amt_paid Number Amount paid in Australian dollars
Carrier_code Char(5) Carrier code of carrier that delivered the order
Emp_no Char(4) Employee number of person who packed the order
Table: Customer
Cust_no Char(5) Customer number
Cust_name Char(20) Customer's name
Phone_no Char(15) Customer's telephone number
Street Char(30) Customer's street address
City Char(20) Customer's city
State Char(20) Customer's state
Post_code Char(10) Customer's post code
Country Char(20) Customer's country
Credit_limit Number Customer's credit limit
Outstanding_bal Number Customer's outstanding balance (amount owing)
Pref_carrier_code Char(5) Customer's preferred carrier
Table: Carrier
Carrier_code Char(5) Carrier code
Carrier_name Char(20) Carrier's nae
Carrier_type Char(8) Type of carrier {air, surface}
Table: Fob
Currency_code Char(1) Currency code
Currency_name Char(15) Name of currency
Currency_date Date Date for which the currency rate applies
Currency_rate Number Currency rate as of the currency date, i.e. the
number of units of the currency that one Australian
dollar will purchase, e.g., one Australian dollar can
currency be exchanged for approximately 0.65 US
dollars.
67
Table: Delivdays
Carrier_code Char(5) Carrier code
City Char(20) Deliver to city
State Char(20) Deliver to state
Country Char(20) Deliver to country
Deliver_days Number Expected number of calendar days for the carrier to
deliver merchandise to the city, state, and country,
i.e., the carrier's estimate of the time required to
deliver an order to the destination described by city,
state, and country.
Table: Employee
Emp_no Char(4) Employee number
Emp_name Char(20) Employee's name
Table: Invoiceitem
Invoice_no Char(7) Invoice number
Item_no Char(7) Inventory item number
Unit_meas Char(5) Unit of measure for item {case, each}
Quoted_unit_price Number Quoted unit cost of the item in Australian dollars
Agreed_unit_price Number Agreed unit cost of the item in Australian dollars
Qty_shipped Number Quantity of the item shipped to the customer
Qty_accepted Number Quantity of the item accepted by the customer
Diff_cause Char(15) Reason for differences in costs or quantities {broken
bottle, damaged cork, late delivery, no diff,
shortage, sugary, vinegary}
Table: Inventory
Item_no Char(7) Inventory item number
Item_name Char(20) Name or description of the item
Item_maker Char(20) Maker of the item, e.g. the vintner
Item_package Char(15) How each component of the item is packaged
{bottle, can, cardboard box}
Item_year Number Year the item was produced.
Type_of_alc Char(5) Type of alcohol {beer, wine}
Alc_category Char(15) Alcohol category {dark, dry, full strength, light,
mid-strength, red, sparkling, white}
Alc_content Number Alcohol content e.g. full strength beers are typically
about 5.0 (percent) and wines are typically between
12 and 14 (percent)
Avg_unit_cost Number Average price per unit at which the item was
purchased from the item maker
Unit_meas Char(5) Unit of measure for item {case, each}
Avg_unit_price Number Average price per unit at which the item is sold to
customers
Qty_hand Number Quantity of the item on hand
Qty_ordered Number Quantity of the item ordered in the last 12 months
68
Appendix E: Experimental Design
Stratification Into Group A and Group B
To control for a testing effect (Huck et al. 1974), and to ensure even representation of skill
sets across Group A and Group B, participants were stratified into classes. This stratification
was in accordance with participants' previous subject enrolments. Participants within each
strata class were then ranked according to their current enrolment subject and their
performance in earlier subjects, and their experience with database query languages. Thirteen
groups were used to classify participants. Table 11 shows the final strata class ordering, and
the number of participants in each strata class.
This process resulted in a ranked listing of participants from one to sixty-six. The
experimental treatment effect of manager-English (ambiguous) and pseudo-SQL (clear) was
assigned randomly to the first student on this list and then alternately to each student
thereafter. This resulted in two student groups with equivalent participant counts: Group A
and Group B. Group A's first question formulation was ambiguous, and then alternately clear
and ambiguous thereafter. Group B's first question formulation was clear, and then
alternately ambiguous and clear thereafter.
69
Table 11
Participant Strata Classes
Strata Class Participant Count Description
865(1) 4 Students in the postgraduate Database Design
subject who had previously participated in more
than one similar experiment.
365(1) 1 Students in the undergraduate Database Design
subject who had previously participated in more
than one similar experiment.
365(2) 1 Computer Science students in the undergraduate
Database Design subject who had previously
participated in a similar experiment.
865(2) 15 Students who had undertaken a database design
course previously and enrolled in the
postgraduate database design subject.
365(3) 10 Students who had undertaken a database design
course previously and undertaking the
undergraduate database design course.
865(3) 2 Students who had undertaken a database design
course previously (but not at University of
Queensland) and undertaking the postgraduate
database design course.
365(4) 13
Students who had undertaken advanced
information systems courses previously and
undertaking the undergraduate database design
course.
865(4) 3 Students who had undertaken information
systems courses previously and undertaking the
postgraduate database design course.
365(5) 6 Students who had undertaken introductory
computer courses previously and undertaking the
undergraduate database design course.
365(6) 3 Students who had undertaken no information
system or computer courses previously and
undertaking the undergraduate database design
course.
865(5) 6 Students undertaking the postgraduate database
design course with no available academic
history.
365(7) 2 Students undertaking the undergraduate database
design course with no available academic
history.
70
The Experiment
The experiment was held over two days during the fourth week of instruction. Students
undertook a two hour closed-book (no reference material allowed) experiment on computer,
with no perusal time, in their normal classes. The random assignment of membership to
Group A and Group B had the purpose and effect of ensuring an even representation of
Group A and Group B in each class.
Participants knew before the experiment that questions increased in complexity, that there
were sixteen questions in total, and that, once a question had been completed, they could not
return to their answer. Participants were also aware that the number of attempts they made
on the question did not affect their mark.
An instruction sheet was provided to participants (refer Appendix B), depending on the
treatment group (A or B) to which the participant had been previously assigned. The only
point of difference between the two groups' instruction sheet was the name of the Unix
command script file to use: startqz199a for Group A and startqz199b for Group B. The
instruction sheet contained an overview of SQL syntax as a reference for participants.
Further, an entity-relationship diagram was provided to describe the database being used, as
reproduced in Appendix D.
Participants could make reference notes on working paper if they required. Participants
returned these materials to the examiner at the end of the experiment. The question
formulations used in the experiment and model answers are reproduced in Appendix A.
71
There were two examiners present (the course lecturer and the researcher). Assistance was
provided to participants in the operation of the experimental program (the Unix command
script). Assistance was also provided on some technical aspects of SQL on request.
User Interface and Query Development Process
Appendix C contains an example of the Unix command interpreter script used by participants
to enter information using the relatively easy-to-use Pico editor, with which they were
familiar. The command interpreter presented the question to the participant. On the
completion of an attempt, the SQL result set was displayed. If the participant did not
consider the results presented to be their final response, the participant could return to the
SQL formulation. If the participant considered the result satisfactory, the participant would
be prompted to rank their confidence in the solution, and proceed to the next question.
Hence, the participant was able to interactively build and test their response until they were
confident in their answer. This confidence was self-assigned on the following scale: >85-
100%, 70-85%, 55-70%, 40-55%, 25-40%, 10-25% and <10%.
The questions were only available electronically. The questions were presented alternately
ambiguous (natural language) and clear (pseudo-SQL). A participant in Group A received an
ambiguous formulation for Question One, clear for Question Two, ambiguous for Question
Three, and so on. A participant in Group B had clear for Question One, ambiguous for
Question Two, clear for question three, and so on. The required answer was identical for
both formulations of the same question.
72
Appendix F: Error Marking Sheets
Semantic Error Counting Form
User Name Question Number Attempts
Confidence:
Duration:
MICRO ERRORS Keywords
View Select From Where Join Where Cond Group by Having Order by
Symbols View Select From Where Join Where Cond Group by Having Order by
Logical Operators View Select From Where Join Where Cond Group by Having Order by
Relational Operators View Select From Where Join Where Cond Group by Having Order by
Tables View Select From Where Join Where Cond Group by Having Order by
Attributes View Select From Where Join Where Cond Group by Having Order by
Values View Select From Where Join Where Cond Group by Having Order by
Set Operators Where Union Intersect Minus
MACRO ERRORS Columns Rows Aggregation
73
SQL Challenge Error Counting Form
User Name Question Number Attempts
Confidence:
SQL CHALLENGE EXPRESSION
Present Challenge Response Comment
Distinct Keyword in Select Clause P / A 1 2 3 4 5 6 7
Built-in Function (Avg, Sum, Std
Dev, etc)
P / A 1 2 3 4 5 6 7
Mathematical Expression in Select
Clause
P / A 1 2 3 4 5 6 7
Mathematical Expression in Where
Clause
P / A 1 2 3 4 5 6 7
Mathematical Expression in
Having Clause
P / A 1 2 3 4 5 6 7
ERD (Join not shown on ERD) P / A 1 2 3 4 5 6 7
Join P / A 1 2 3 4 5 6 7
Outer Join P / A 1 2 3 4 5 6 7
Subquery P / A 1 2 3 4 5 6 7
Or (Where or Having) P / A 1 2 3 4 5 6 7
Between P / A 1 2 3 4 5 6 7
Not Equal P / A 1 2 3 4 5 6 7
Group By P / A 1 2 3 4 5 6 7
Having P / A 1 2 3 4 5 6 7
View P / A 1 2 3 4 5 6 7
74
Intermediate Error Counting Form
User Name Question Number Attempts
Confidence:
Column Errors
Missing
Extra
Wrong (in contrast with missing &
extra columns)
Table Errors
Missing
Extra
Wrong
Row Restriction
Missing
Extra
Wrong
Logical Operator
Join Restrictions
Missing
Extra
Wrong
Aggregation Level (Group by/Aggregation in Select)
Missing
Extra
Wrong
Aggregation Restriction (Having)
Missing
Extra
Wrong
Sort/Order by
Missing
Wrong Attribute Order
Wrong Direction (ascending,
descending)
Wrong
75
Appendix G: Annotated Corrected Participant Response
This appendix provides an annotated example of the process used to correct participant
responses according to the model answer. This question was chosen to provide a flavour of
the methodology used to determine and classify errors. The response shown here is the fifth
participant's response (in order of assessment) to the third question.
Model Answer:
Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000
or country = 'Japan';
Actual Response:
Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000
and country = 'japan';
Annotated Response:
Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000
and (1)
or (2)
country = 'j (3)
J(4)
apan';
In this annotated response, the superscript number in brackets indicates the error count. In
this response there are four micro errors.
76
Micro Error Sheet:
Errors (1) and (2) result in a total of two logical operator errors in the WHERE COND clause.
Errors (3) and (4) result in a total of two value errors in the WHERE COND clause.
Macro Error Sheet
There are two row errors here, as there are two errors in the WHERE COND clause.
SQL Challenge Sheet
The SQL Challenge presented in this question is the "Or (Where or Having)" challenge. The
challenge is present, and the participant's response to the challenge was poor, resulting in a
"1" assessment.
Intermediate Error Counting Sheet
In this response there are two row restriction errors, one "wrong" row restriction and one
"logical operator" error.
77
Appendix H: Pearson Correlation Matrix of Variables
Am
big
uit
y
Co
mp
lex
ity
Att
em
pts
Co
nfi
den
ce
Du
ra
tio
n
To
tal
Erro
rs
Lex
ica
l
Sy
nta
cti
ca
l
Infl
ecti
ve
Pra
gm
ati
c
Ex
tra
neo
us
Em
ph
ati
c
Su
gg
est
ive
GP
A
Ambiguity 1.0000
one-sided p 0.0000
Complexity -0.0330 1.0000
one-sided p 0.2488 0.0000
Attempts 0.1247 0.3312 1.0000
one-sided p 0.0050 0.0000 0.0000
Confidence -0.0961 -0.2463 -0.4242 1.0000
one-sided p 0.0239 0.0000 0.0000 0.0000
Duration 0.1729 0.2932 0.6905 -0.4282 1.0000
one-sided p 0.0002 0.0000 0.0000 0.0000 0.0000
Total Errors 0.2421 0.4783 0.2742 -0.3241 0.3653 1.0000
one-sided p 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Lexical 0.7169 -0.0593 0.0847 -0.1213 0.2241 0.2165 1.0000
one-sided p 0.0000 0.1114 0.0406 0.0062 0.0000 0.0000 0.0000
Syntactical 0.6103 -0.1196 0.0532 0.0153 -0.0122 -0.0491 0.0855 1.0000
one-sided p 0.0000 0.0068 0.1367 0.3769 0.4007 0.1564 0.0391 0.0000
Inflective 0.3957 -0.0219 -0.0602 0.0698 0.0118 0.2534 0.2816 0.1606 1.0000
one-sided p 0.0000 0.3266 0.1079 0.0754 0.4045 0.0000 0.0000 0.0004 0.0000
Pragmatic 0.4735 -0.1131 0.0877 -0.0403 0.1057 0.2521 0.4378 0.1257 0.2299 1.0000
one-sided p 0.0000 0.0098 0.0354 0.2035 0.0146 0.0000 0.0000 0.0048 0.0000 0.0000
Extraneous 0.1855 0.3333 0.1410 -0.0223 0.2183 0.5764 0.2616 -0.2611 0.5837 0.3314 1.0000
one-sided p 0.0001 0.0000 0.0018 0.3234 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Emphatic 0.7173 0.1914 0.1886 -0.1490 0.2482 0.3588 0.7100 0.2746 0.1177 0.2486 0.2870 1.0000
one-sided p 0.0000 0.0000 0.0000 0.0010 0.0000 0.0000 0.0000 0.0000 0.0076 0.0000 0.0000 0.0000
Suggestive 0.4930 0.2863 0.1432 -0.0270 0.1927 0.5611 0.3881 0.1127 0.5723 0.4139 0.8347 0.4058 1.0000
one-sided p 0.0000 0.0000 0.0015 0.2893 0.0000 0.0000 0.0000 0.0101 0.0000 0.0000 0.0000 0.0000 0.0000
GPA (n=420) 0.0000 0.1256 -0.0842 0.1764 -0.1256 -0.1313 -0.0282 0.0336 0.0099 0.0079 -0.0013 0.0010 0.0275 1.0000
one-sided p 0.4999 0.0050 0.0424 0.0001 0.0050 0.0035 0.2820 0.2463 0.4196 0.4358 0.4891 0.4919 0.2869 0.0000
78
Appendix I: Analysis of Ambiguity's Effect On Error Type
Question One
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.156 0.406 0.000 0.000 0.000 0.344 0.000 0.906
Select C 0.091 0.273 0.000 0.000 0.000 0.182 0.000 0.545
From A 0.000 0.000 0.000 0.000 0.250 0.000 0.000 0.250
From C 0.000 0.061 0.000 0.000 0.091 0.000 0.000 0.152
Where Join A 0.031 0.063 0.000 0.031 0.063 0.063 0.000 0.250
Where Join C 0.030 0.061 0.000 0.030 0.061 0.061 0.000 0.242
Where Cond A 0.031 0.063 0.000 0.031 0.000 0.031 0.031 0.188
Where Cond C 0.000 0.000 0.000 0.000 0.000 0.000 0.061 0.061
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.000 0.030 0.000 0.000 0.000 0.030 0.000 0.061
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.030 0.000 0.000 0.000 0.000 0.030 0.000 0.061
Total A 0.219 0.531 0.000 0.063 0.313 0.438 0.031 1.594
Total C 0.152 0.424 0.000 0.030 0.152 0.303 0.061 1.121
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.000 Ambiguous 1.594 32
Where C 0.000 Clear 1.121 33
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000
Question Two
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.091 1.394 0.030 0.000 0.061 1.212 0.030 2.818
Select C 0.000 0.061 0.000 0.000 0.000 0.000 0.000 0.061
From A 0.061 0.030 0.000 0.000 0.121 0.000 0.000 0.212
From C 0.000 0.000 0.000 0.000 0.030 0.000 0.000 0.030
Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond A 0.182 0.364 0.000 0.091 0.121 0.364 0.121 1.242
Where Cond C 0.030 0.152 0.000 0.000 0.000 0.030 0.000 0.212
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 0.333 1.788 0.030 0.091 0.303 1.576 0.152 4.273
Total C 0.030 0.212 0.000 0.000 0.030 0.030 0.000 0.303
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.000 Ambiguous 4.273 33
Where C 0.000 Clear 0.303 33
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000
79
Question Three
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.061 1.182 0.000 0.000 0.000 1.182 0.000 2.424
Select C 0.000 0.212 0.000 0.000 0.000 0.152 0.000 0.364
From A 0.030 0.000 0.000 0.000 0.030 0.000 0.000 0.061
From C 0.000 0.000 0.000 0.000 0.061 0.000 0.000 0.061
Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond A 0.061 0.182 0.788 0.091 0.000 0.030 0.303 1.455
Where Cond C 0.000 0.152 0.152 0.030 0.000 0.091 0.152 0.576
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.030 0.000 0.030
Total A 0.152 1.364 0.788 0.091 0.030 1.212 0.303 3.939
Total C 0.000 0.364 0.152 0.030 0.061 0.273 0.152 1.030
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.000 Ambiguous 3.970 33
Where C 0.000 Clear 1.030 33
Union A 0.030
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.030
Total C 0.000
Question Four
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.031 0.000 0.000 0.000 0.000 0.000 0.000 0.031
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.813 1.031 0.000 0.000 0.094 0.500 0.000 2.438
Select C 0.121 0.182 0.000 0.000 0.000 0.121 0.000 0.424
From A 0.063 0.094 0.000 0.000 0.219 0.000 0.000 0.375
From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond A 0.094 0.000 0.031 0.000 0.000 0.000 0.000 0.125
Where Cond C 0.061 0.061 0.000 0.000 0.000 0.061 0.000 0.182
Group By A 0.313 0.125 0.000 0.000 0.000 0.438 0.000 0.875
Group By C 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.030
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.031 0.063 0.000 0.000 0.000 0.094 0.000 0.188
Order By C 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.030
Total A 1.344 1.313 0.031 0.000 0.313 1.031 0.000 4.031
Total C 0.242 0.242 0.000 0.000 0.000 0.182 0.000 0.667
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.000 Ambiguous 4.031 32
Where C 0.000 Clear 0.667 33
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000
80
Question Five
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.091 0.000 0.000 0.000 0.030 0.000 0.000 0.121
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.121 1.273 0.000 0.000 0.030 1.273 0.000 2.697
Select C 0.033 0.267 0.000 0.000 0.067 0.333 0.000 0.700
From A 0.030 0.303 0.000 0.000 0.333 0.000 0.000 0.667
From C 0.033 0.233 0.000 0.000 0.333 0.000 0.000 0.600
Where Join A 0.030 1.212 0.333 0.515 1.273 1.303 0.000 4.667
Where Join C 0.033 0.700 0.200 0.233 0.667 0.733 0.000 2.567
Where Cond A 0.030 0.212 0.091 0.212 0.000 0.273 0.364 1.182
Where Cond C 0.000 0.233 0.100 0.200 0.067 0.433 0.267 1.300
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 0.303 3.000 0.424 0.727 1.667 2.848 0.364 9.333
Total C 0.100 1.433 0.300 0.433 1.133 1.500 0.267 5.167
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.091 Ambiguous 9.424 33
Where C 0.033 Clear 5.200 30
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.091
Total C 0.033
Question Six
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.043 0.000 0.000 0.000 0.000 0.000 0.000 0.043
Select A 2.235 5.529 0.000 0.000 0.765 3.412 0.353 12.294
Select C 0.174 1.435 0.000 0.000 0.217 0.652 0.043 2.522
From A 0.000 0.471 0.000 0.000 0.588 0.000 0.000 1.059
From C 0.000 0.087 0.000 0.000 0.087 0.000 0.000 0.174
Where Join A 0.235 1.353 0.176 0.647 1.294 1.294 0.000 5.000
Where Join C 0.000 0.391 0.174 0.174 0.391 0.522 0.000 1.652
Where Cond A 0.176 2.118 0.941 1.647 0.294 2.118 1.471 8.765
Where Cond C 0.000 0.217 0.130 0.130 0.000 0.130 0.130 0.739
Group By A 0.706 2.059 0.000 0.000 0.647 2.353 0.000 5.765
Group By C 0.261 1.130 0.000 0.000 0.391 1.087 0.000 2.870
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 3.353 11.529 1.118 2.294 3.588 9.176 1.824 32.882
Total C 0.478 3.261 0.304 0.304 1.087 2.391 0.174 8.000
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.059 Ambiguous 32.941 17
Where C 0.000 Clear 8.000 23
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.059
Total C 0.000
81
Question Seven
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.200 0.000 0.000 0.000 0.067 0.000 0.000 0.267
View C 0.000 0.133 0.000 0.000 0.000 0.000 0.000 0.133
Select A 0.533 1.200 0.000 0.000 0.333 0.600 0.000 2.667
Select C 0.733 0.800 0.000 0.000 0.133 0.533 0.000 2.200
From A 0.133 0.067 0.000 0.000 0.200 0.000 0.000 0.400
From C 0.000 0.067 0.000 0.000 0.067 0.000 0.000 0.133
Where Join A 0.200 1.667 0.067 0.133 0.133 0.133 0.000 2.333
Where Join C 0.067 1.067 0.067 0.067 0.133 0.133 0.000 1.533
Where Cond A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond C 0.000 0.067 0.067 0.067 0.067 0.067 0.067 0.400
Group By A 0.267 0.400 0.000 0.000 0.200 0.467 0.000 1.333
Group By C 0.133 0.467 0.000 0.000 0.200 0.400 0.000 1.200
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.067 0.133 0.000 0.067 0.133 0.133 0.000 0.533
Order By A 0.000 0.000 0.000 0.000 0.000 0.267 0.000 0.267
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 1.333 3.333 0.067 0.133 0.933 1.467 0.000 7.267
Total C 1.000 2.733 0.133 0.200 0.733 1.267 0.067 6.133
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.000 Ambiguous 7.267 15
Where C 0.000 Clear 6.133 15
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000
Question Eight
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.167 0.667 0.000 0.000 0.000 0.167 0.000 1.000
Select C 0.200 0.400 0.000 0.000 0.000 0.100 0.000 0.700
From A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join A 0.000 0.333 0.167 0.167 0.333 0.333 0.000 1.333
Where Join C 0.000 0.600 0.300 0.300 0.600 0.800 0.000 2.600
Where Cond A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond C 0.000 0.000 0.100 0.000 0.000 0.000 0.000 0.100
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.100 0.400 0.000 0.000 0.100 0.400 0.000 1.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.400 0.600 0.000 0.100 0.200 0.600 0.100 2.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 0.167 1.000 0.167 0.167 0.333 0.500 0.000 2.333
Total C 0.700 2.000 0.400 0.400 0.900 1.900 0.100 6.400
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.000 Ambiguous 2.333 6
Where C 0.000 Clear 6.400 10
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000
82
Question Nine
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 2.000 1.000 0.000 0.000 0.000 1.000 0.000 4.000
Select C 1.500 1.000 0.000 0.000 0.000 2.000 0.000 4.500
From A 0.000 0.333 0.000 0.000 0.333 0.000 0.000 0.667
From C 0.000 0.500 0.000 0.000 0.500 0.000 0.000 1.000
Where Join A 0.000 0.667 0.333 0.333 0.667 0.667 0.000 2.667
Where Join C 0.000 1.000 0.500 0.500 1.000 1.000 0.000 4.000
Where Cond A 0.000 1.333 1.333 0.667 0.000 0.667 1.333 5.333
Where Cond C 0.000 1.000 1.000 0.500 0.000 0.500 1.000 4.000
Group By A 0.333 1.333 0.000 0.000 0.333 1.333 0.000 3.333
Group By C 0.000 1.000 0.000 0.000 0.000 1.000 0.000 2.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 1.000 0.333 0.000 0.000 0.000 0.667 0.000 2.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 3.333 5.000 1.667 1.000 1.333 4.333 1.333 18.000
Total C 1.500 4.500 1.500 1.000 1.500 4.500 1.000 15.500
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.000 Ambiguous 18.000 3
Where C 0.000 Clear 15.500 2
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000
Question Ten
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
From A 0.000 1.000 0.000 0.000 1.000 0.000 0.000 2.000
From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join A 0.000 2.000 1.000 1.000 2.000 2.000 0.000 8.000
Where Join C 0.000 0.500 0.250 0.250 0.500 1.000 0.000 2.500
Where Cond A 1.000 3.000 0.000 2.000 0.000 2.000 2.000 10.000
Where Cond C 0.000 0.500 0.000 0.500 0.000 0.000 0.500 1.500
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.000 0.500 0.000 0.000 0.000 0.500 0.000 1.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 1.000 6.000 1.000 3.000 3.000 4.000 2.000 20.000
Total C 0.000 1.500 0.250 0.750 0.500 1.500 0.500 5.000
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.000 Ambiguous 20.000 1
Where C 0.000 Clear 5.000 4
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000
83
Question Eleven
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 2.500 4.000 0.000 0.000 0.500 2.500 0.000 9.500
Select C 1.000 1.000 0.000 0.000 0.000 0.000 0.000 2.000
From A 0.500 0.000 0.000 0.000 0.500 0.000 0.000 1.000
From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond C 0.000 0.000 0.000 0.000 0.000 0.000 2.000 2.000
Group By A 0.500 1.000 0.000 0.000 0.500 1.000 0.000 3.000
Group By C 0.000 1.000 0.000 0.000 0.000 1.000 0.000 2.000
Having A 3.000 2.000 0.000 1.000 0.000 2.000 0.000 8.000
Having C 1.000 1.000 0.000 0.000 0.000 0.000 0.000 2.000
Order By A 0.500 0.000 0.000 0.000 0.000 0.000 0.000 0.500
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 7.000 7.000 0.000 1.000 1.500 5.500 0.000 22.000
Total C 2.000 3.000 0.000 0.000 0.000 1.000 2.000 8.000
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A 0.000 Ambiguous 22.500 2
Where C 0.000 Clear 8.000 1
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.500
Minus C 0.000
Total A 0.500
Total C 0.000
Question Twelve
SQL
Component
Type Keywords Symbols Logical
Operators
Relational
Operators
Tables Attributes Values Total:
View A
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A
Select C 0.000 2.000 0.000 0.000 0.000 0.000 0.000 2.000
From A
From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join A
Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond A
Where Cond C 0.000 0.000 0.000 0.000 0.000 0.000 2.000 2.000
Group By A
Group By C 0.000 1.000 0.000 0.000 0.000 1.000 0.000 2.000
Having A
Having C 0.000 2.000 0.000 0.000 0.000 0.000 0.000 2.000
Order By A
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A
Total C 0.000 5.000 0.000 0.000 0.000 1.000 2.000 8.000
SQL
Component
Type Set
Operators
Summary Error
Average
Response
Count
Where A Ambiguous
Where C 0.000 Clear 8.000 1
Union A
Union C 0.000
Intersect A
Intersect C 0.000
Minus A
Minus C 0.000
Total A
Total C 0.000
84
Appendix J: Seven Ambiguity Types Question Assessment Ratings
This table displays the average of ambiguity assessments provided by two independent non-
researchers. The scale used to assess the presence of the different type of ambiguity is:
0 1 2 3 4
None
A little Some Much A Great Deal
Question Formulation
Lexic
al
Syn
tacti
cal
Infl
ecti
ve
Pragm
ati
c
Extr
an
eou
s
Em
ph
ati
c
Su
ggest
ive
1 Ambiguous 1.5 2 0.5 1 0.5 0.5 0.5
1 Clear 0.5 0.5 0 0 0 0 0
2 Ambiguous 2 1 0 1.5 0.5 1.5 0.5
2 Clear 1 0.5 0 1 0 0 0
3 Ambiguous 0.5 3.5 0 1 0 0.5 0.5
3 Clear 0.5 0 0 0 0.5 0.5 0
4 Ambiguous 1.5 1 0 3 0 0.5 0
4 Clear 0.5 2 0 2 0 0 0
5 Ambiguous 1.5 2.5 0 0.5 0 2 0
5 Clear 1 0.5 0 0 0.5 0 0
6 Ambiguous 1.5 0.5 0.5 3 3.5 1.5 2.5
6 Clear 0.5 0.5 0 0.5 0.5 0 0
7 Ambiguous 1.5 2.5 0 0.5 0 1 1
7 Clear 0.5 0.5 0 0 0 0 0
8 Ambiguous 0.5 3.5 0.5 0.5 0 0 0
8 Clear 0.5 0.5 0 1 0 0 0
9 Ambiguous 1.5 0.5 0 2.5 0 3 0
9 Clear 0.5 0.5 0 0.5 0 0 0.5
10 Ambiguous 2 0 0 2 0.5 1 1.5
10 Clear 0.5 0 0 0 0 0 0
11 Ambiguous 1.5 1 0 2 1 1 1.5
11 Clear 0.5 0 0 0 0 0 0
12 Clear 0.5 0 0 0.5 0.5 0 0
85
Appendix K: Ambiguity Assessment Instrument
Ambiguity Measurement Questionnaire
Type Information Request
Lexical A report of our clients for our marketing brochure mail-out.
The word "report" may have several meanings, independent of its context.
There is: a gunshot report echoing through the hillside; the Lieutenant
reported to the Captain; I dropped the heavy report on my toe, etc.
Although the context may make the meaning clear, the lexical ambiguity
that is present adds to cognitive effort and contributes to ambiguity overall
in that manner.
Syntactical A report of clients in Brisbane and on our Gold list.
The natural language "and" does not map well to its Boolean equivalent. A
legitimate interpretation would be to assume that this request is for clients
that satisfy both conditions (Brisbane-based and on the Gold List), or for
clients that satisfy either condition (Brisbane-based or on the Gold list).
Another formulation is Bob hit the man with a stick. It is not clear,
syntactically, whether it was the man with a stick, that was hit, or whether
the man was hit with a stick by Bob.
Inflective A report that is the product of our last marketing campaign regarding sales
of our accounting software product in the last month.
Inflective ambiguity here derives from the use of the word "product" with
two different meanings in the one information request. Inflective
ambiguity is where the same word is used in the one grammatical structure
(paragraph, sentence, phrase) with different meanings. Natural writing
tends to avoid this.
Pragmatic A report of all the clients for a department.
The ambiguity here is that the department has not been specified. It would
be legitimate to prepare a report for any department, although it is likely
that this will not address the needs of the person making the information
request. Further information is needed to resolve this actual ambiguity.
Extraneous A report of all clients (and their names and addresses only) for the Tax and
Business Services department. Some of those clients are our biggest
earners, you know.
The last sentence is extraneous - unlike pragmatic ambiguity, it contains
information that is redundant, uninformative, or not necessary to meet the
needs of the question or task asked in the statement. It is "noise" in the
communication - where more words are used than are necessary to make
the statement.
Emphatic A report of our good clients.
Ambiguity here could derive from the lack of ability to provide the verbal
emphasis of the words in its written form. Depending on the emphasis
used, "good clients" could be legitimately interpreted to be clients that pay
on time, clients that have the most dollar-value sales, our very best clients
86
Type Information Request
(a much shorter list than if based on dollar-value), or even, with the correct
sarcastic or ironic emphasis on the spoken word, our worst clients - those
that do not pay.
Suggestive A report of the clients of this accounting practice that have lodged taxation
returns in the past five years in accordance with the requirements of the
Australian Taxation Office.
The request for information is quite clear until the phrase "in accordance
with the requirements of the Australian Taxation Office". By definition, all
taxation returns should be lodged in accordance with these requirements.
The extra phrase introduces suggestive ambiguity into the information
request by suggesting that the report will not necessarily consist of all
taxation clients.
87
Mark all Information Requests in Accordance with the Following Scale
0 1 2 3 4
none A little Some Much A Great
Deal
No. Ambiguity
Type
Information Request
(Scale)
1. Management wants a list of each of our suppliers with no
duplicates in the list.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
List the distinct suppliers of the items we stock.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
2. Produce a report that lists the inventory items where the quantity
on hand is much larger, on a percentage basis, than the quantity
ordered.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
List item number, item name, quantity on hand, quantity on order
where quantity on hand is greater than 2 * quantity ordered.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
88
No. Ambiguity
Type
Information Request
(Scale)
3. Management wants a list of all Japanese customers and customers
with credit limits over $15,000.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
List customer numbers, customer names, country, and credit limit
of customers with credit limits greater than $15,000 or of
customers in Japan.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
4. Produce a report that statistically compares the credit limits for
customers in different countries.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
List country, average credit limit, and standard deviation of
customer credit limit grouped by country.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
5. Produce a report of clients that prefer the Speedair carrier and
addresses.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
89
No. Ambiguity
Type
Information Request
(Scale)
List customer number, customer name, street, city, post code, and
country where the customer's preferred carrier is Speedair.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
6. We're wondering if some of our winemakers are using poor quality
packaging and bottles - we've had a few complaints. Can you get
us a report that gives us some sort of idea about what items we are
shipping compared to what the customers are taking delivery of?
It would probably be a good idea while you're at it to give a
comparative percentage of the stuff shipped that doesn't make it -
just so the vintners won't try and weasel their way out of it, you
understand, they're good at that.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
List item maker, item number, item name, and 100 * (sum of
quantity shipped less sum of quantity accepted) / (sum of quantity
shipped) where the type of alcohol is wine.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
7. Prepare a report that provides *all* customer's details and
indicates the number of different products they have ordered from
us.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
90
No. Ambiguity
Type
Information Request
(Scale)
List customer number, and customer name for *all* customers,
and, if they have ordered anything, a count of unique items
ordered.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
8. Management wants to know which customers we've shipped goods
more than 10 times to them by the shipper that they requested.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
List customer number, name, and count of invoices, where the
actual carrier is the same as the customer's preferred carrier,
having more than 10 shipments.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
9. Produce a report, with best items first, on the gross contribution to
profitability of each inventory item for July 1999.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
List item number, item description, and (unit price less unit cost)
multiplied by units sold in July 1999. Sort your output by
descending gross contribution to profitability.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
91
No. Ambiguity
Type
Information Request
(Scale)
Suggestive 0 1 2 3 4
92
No. Ambiguity
Type
Information Request
(Scale)
10. Produce a report with the relevant customer details that gives us an
idea of how much of our business is exposed to foreign currency
fluctuations.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
List customer number, customer name, customer country, and a
total of the amount paid where the settlement currency code for the
invoice is not equal to the currency code for Australian dollars.
Group results by customer number.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
11. Management is concerned about current slow-moving inventory
items, based on shipments since 1 June 1999. Produce a report of
the items that they might be most concerned about.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
List inventory item number, item description, quantity on hand,
and sum(quantity shipped) with ship dates greater than 1 June
1999 that have sums of the quantity shipped less than the sums of
the quantity on hand.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
93
No. Ambiguity
Type
Information Request
(Scale)
12. Produce a report that gives some idea about our best USA export
items where the amount since March is bigger than $5,000.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
List item numbers, item descriptions and the total accepted
quantity times agreed price of each item for items shipped to US
customers since 1 March 1999 and having a total accepted quantity
times agreed price greater than $5,000.
Lexical
Syntactical
Inflective
Pragmatic
Extraneous
Emphatic
Suggestive
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
94
Appendix L: Internal Validity of the Experiment
A full explanation of the recognised seven "threats" for the internal validity of experiments is
contained in Huck et al. (1974). The comments made below have their basis in the discussion
presented in Huck et al. (1974).
History
The history threat to internal validity arises where an event outside of the domain of the
experiment occurs that may affect the independent variable. As the experiment took place
over a two hour period in a controlled setting, over two days of experimental testing, there is
not considered to be a history threat to internal validity for this experiment.
Maturation
Maturation occurs where the participants mature, grow, and learn during the course of the
experiment. The passage of time increases the recorded end user query performance. Any
maturation effect is adequately controlled for in this instance, as the experiment was two
hours in duration, homogeneous groups were used, and each tutorial group tested contained
both Group A and Group B participants. Further, both groups received the ambiguity
treatment on alternate questions. Any residual maturation effect (such as learning the use of
the SQL experimental tool or increased proficiency in SQL during the experiment) applies
equally to the clear and ambiguous treatment effects.
95
Testing
Testing occurs where the individuals taking the test score higher than their first sitting of the
test. Within this experiment, the possibility exists that participants learned more about the
use of the experimental tools and process (the SQL editor). Subsequent questions (for
example, question one compared to question six) might result in superior performance
(particularly time for completion) due to the testing effect. Due to the factors cited for the
maturation effect, any experimental testing effect - should there be any - applies equally to
both the clear and ambiguous formulations of the question. Additionally, participants who
had undertaken similar experiments previously are stratified into separate classes. Group A
and Group B were homogeneous in this respect. Therefore, both within the experiment, and
from previous experiments, any testing effect that exists in this experiment from these
sources applies equally to both treatment effects.
Instrumentation
Instrumentation is identified by Huck et al. (1974) as the effect of any change in the
observational technique accounting for any experimentally observed difference. This could
arise in the current experiment with a maturation change in the assessors over the time taken
to assess student responses. Assessors could correct later participant responses differently to
earlier participant responses.
This effect is controlled for in several ways. Firstly, when assessing responses, assessors had
no means to identify participant responses by student name, only student number. This
avoided assessors' preconceptions about student's performance. The use of two independent
assessors controlled for some differences in marking strategies, as did the use of diary notes
96
to ensure consistency of marking over time. An exhaustive cross-checking and data
correctness procedure also mitigates this effect.
Responses were assessed by student in no particular order. Group A and Group B participant
responses were evenly distributed in the marking order, with a calculated non-parametric runs
test z statistic of 0.9924 (Newbold 1984). This weak z-statistic (significant only at a 32%
confidence level on a two-tailed hypothesis) implies that any residual instrumentation effect,
should it exist, is evenly applicable to either question formulation. Overall, the threat of
instrumentation to experimental results in this regard is controlled for.
Statistical Regression
Statistical regression occurs where the analysis of the experiment is on extreme scores, such
that subsequent tests tend to regress to the mean (Huck et al. 1974). The current experiment
is not exposed to this threat to internal validity, as extreme scores are not the focus of the
experiment. Furthermore, the experimental design and assessment process used adequately
controls for this threat to internal validity, as previously described.
Mortality
Mortality occurs where participants drop out of the experiment during its course. As this
experiment is short in duration (two hours), participant mortality did not occur during the
experiment. In addition, all sixty-six students enrolled in the subjects participated in the
experiment. The mortality effect is of some concern, however, in that incomplete participant
responses were removed from the analysis. There were 506 participant responses, of which
425 responses were completed and statistically analysed in the experiment.
97
The effect of this acknowledged experimental bias is to reduce the total number of responses
examined, and a general tendency to remove from analysis responses with a significant
number of errors. As this bias tends to be against the direction of the hypotheses made in this
paper, any conclusions drawn in this regard are strengthened, and the mortality effect on
interpretation of results is lessened. Overall, the mortality effect strengthens any conclusions
drawn, and thus is less of an internal validity issue for the current experiment.
Selection Bias
The selection process resulted in two homogeneous groups, Group A and Group B, drawn
from the entire student population of two information systems subjects. There is no evident
selection bias between Group A and Group B. In any case, both Group A and Group B
received the treatment effect of ambiguity on alternate questions, further mitigating concerns
of the effect of a selection bias on experimental results.