A Report on Online Assessment - University of OxfordQuestionMark Perception1 is a commercial...

Is Rogō a viable alternative to QuestionMark Perception in the Medical Sciences Division at Oxford?IntroductionThe Medical Sciences Division (MSD) at the University of Oxford has been running online assessments using QuestionMark Perception since 2003 - summative assessment began in 2004 and continues to grow. In 2010-11, we delivered 161 online assessments, of which 53 were formal University exams, to a total of c.17,000 participants.

QuestionMark Perception1 is a commercial e-assessment system which is widely used in UK HEIs. Rogō2 is an open source e-assessment system developed at the University of Nottingham (UoN) and used across its UK and international campuses.

The drivers for changePerception has proved a very reliable and secure assessment delivery platform. It is packed with features and is continually improving and adding new features. However, there are five areas in which it still does not satisfactorily meet our requirements:

Driver 1: Authoring

Until recently, assessment in medical sciences has been dominated by the extended matching question type. The prevalence of this question type (Figure 1) and a number of other standards that we have developed mean that the Perception question creation and editing interfaces are inadequate. We have therefore developed our own question creator to e.g. alphabetise and generate the table of answer options. We

1 QuestionMark Perception, http://www.questionmark.co.uk (accessed 5th May 2012)

2 Rogo, http://www.nottingham.ac.uk/rogo/index.aspx (accessed 5th May 2012)

Figure 1. Typical extended matching question

A Report on

Online AssessmentDamion Young and Jon Mason, Medical Sciences Division Learning Technologies (MSDLT)

have not been able to integrate this fully with Perception so creation involves importing QML (Perception’s native format) and editing is often done by recreating and overwriting questions – a process prone to errors.

The complexity of this process has contributed to the great majority of question and assessment creation being carried out by our half-time e-learning administrator.

We need a tool which will allow non-technical users to easily create and edit all the questions types that we commonly use.

Driver 2: Performance and delivery

With the move to Perception v4 in 2008, we installed a four server, load-balanced system which did, initially at least, allow us to start 90 students (the maximum we can accommodate in one sitting) simultaneously without problems. Performance has decreased over time and we now start in groups of 20 or so with the expectation that one or two students will need to restart after an error. We are in the process of upgrading but, although initial testing suggests that it is considerably faster than the existing system, our previous experience, and QuestionMark’s own documentation3, suggests that it may still not deliver the performance we want.

We have never had a major interruption to an exam and have experienced no more than a handful of workstation failures. Perception’s Save As You Go (SAYG) does autosave students’ answers but does not save elapsed time. In the event of a failure or interruption, the background timer continues counting down, and can submit an assessment before a student can return to it. We need a system which will reissue the correct time remaining on resumption.

Driver 3: Licence, hardware and maintenance costs

As well as the considerable cost of the Perception hardware and licence for our existing system, our annual support package for 2000 students with QuestionMark Secure is significant.

Ageing hardware and the software upgrade meant another large hardware investment last year. The size and complexity of Perception means that upgrading from version to version, particularly on high availability hardware, is far from a trouble-free process. After a dismal experience upgrading from v3 to v4, we decided to employ QuestionMark’s

3 Best Practice: Scalability in Perception Version 5, https://www.questionmark.com/perception/help/v5/best_practice_guides/scalability/content/v5_scalability.pdf (accessed 5th May 2012)

consultants for the current upgrade. This has certainly made the process simpler but is another regular cost.

Driver 4: Reporting

In order to make the most of the institution’s investment in question-writing and online assessment, we want to be able to provide reports for examiners which, for each question, combine:

♦ The question as seen by students

♦ Correct answers

♦ Feedback given (where appropriate)

♦ Syllabus information

♦ Assessments in which the question has been used

♦ Previous versions of questions/differences from previous versions

♦ Performance of question items on individual assessments and across assessments

♦ Highlighting of potentially problematic question items

Ideally this would also be searchable and allow filtering. Perception does not provide this out of the box. We have been able to deliver some of this using Javascript, but the process is fairly manual and vulnerable to changes in the way that Perception questions and reports are delivered.

Driver 5: Lack of Flexibility

Perception comes with many features to customize look and feel, delivery, etc. However, they do not address the authoring and reporting drivers outlined above. Perception also provides numerous APIs with which third party applications can interact with the system – we use these for logging students in for example – and these could be leveraged to address our needs to some extent. However, we don’t want to have to ask users to use multiple systems, with different interfaces, in order to access and manage questions and assessments – everything should be available in one place.

Another problem is that changes in the features provided by Perception’s Authoring Manager, Enterprise Manager, etc. are at the discretion of QuestionMark and/or subject to a development/consultancy fee. We have suffered with a number of awkward/annoying interface issues over the years e.g. dialogue boxes that won’t resize to show long assessment names, menus that don’t work in modern browsers, etc. These are relatively minor issues to fix but we have very little ability to get them prioritized.

2

Does Rogō address the drivers for change?Driver 1: Authoring

The majority of the question types that we have ever used and offered by Perception are also offered by Rogō. Rogō does automatically create a more readable, and optionally alphabetised, table of options at the head of an extended matching question (EMQ). It also has built-in support for EMQs in which a single stem can have more than one answer whereas in Perception this requires some awkward use of the Matching question type. Rogō is also soon to have an Area question type which will assess agreement between a pre-determined polygon on an image and one drawn by a participant – something we have been repeatedly asked for in anatomy.

One nice, but minor, feature of Rogō is that reorganisating questions within an assessment is a simple drag and drop operation rather than Perception’s more clunky delete and re-insert.

Rogō does fully address our original authoring driver for change.

Driver 2: Performance and deliveryWe have yet to test Rogō with a full cohort of 90 students. Our first test saw two sittings of c.40 students sitting an image-rich anatomy paper in Rogō while two sittings of c.40 of their peers sat the same paper on Perception. Performance was observed rather than measured but, starting the students in groups of c.20, Rogō delivered all the papers with only a few seconds’ delay. Perception (v4) exhibited a very similar performance. This is remarkable as Rogō was running on a single server, while Perception was running on a four-server load-balanced/clustered setup. UoN started 3874 students simultaneously across seven locations in January 2012 using a single, albeit very well-specified, server. This looks encouraging and could have implications for hardware costs as well.

Rogō currently has no timer, which brings its own issues, but it does mean that it doesn’t suffer the problem with elapsed time in case of interruptions. It does, however, have the concept of a Fire Escape button which saves the assessment and blanks the screen during an evacuation. If this could be combined with Perception-style SAYG and timing, we would have a system which exactly meets our requirements in this respect.

Driver 3: Licence, hardware and maintenance costsOriginal licences for Perception with 2000 participants and QuestionMark Secure were a significant investment. The Perception Software Support Plan is then a major annual cost. This provides a technical support service, access to various connectors and free upgrades.

In contrast, Rogō requires no licence or support fee. During the life of the current project, support from UoN (outside their own users) is targeted primarily at partner institutions, with support to other groups on a best effort basis. In the long term, the hope is that support will eventually be mutual within the development and user communities.

However, a paid for software support plan provides a certain level of comfort and defensibility – it remains to be seen whether community support will be adequate or whether, if users are unwilling to use a system without it, UoN will consider paid-for support.

Assuming that it is possible to load-balance Rogō to improve performance and reliability (not yet tested), then it will have a great advantage over Perception in that extra servers will not require extra licence and support fees. This is currently the major factor limiting our ability to improve performance with Perception as these costs, with our setup, are well over six times the cost of a typical server. However, these potential savings are, as OSS Watch5 admits, far from clear-cut when development, maintenance, etc. are taken into account.

4 Anthony Brown, Pers. Comm. 5th May 20125 OSS Watch: Benefits of open source code, http://www.oss-watch.ac.uk/resources/whoneedssource.xml (accessed 5th May 2012)

3

Driver 4: ReportingPerception’s reporting is currently more extensive than Rogō’s. However, from the point of view of the reporting that we actually use and the needs that are driving change, the two systems are generally similar but do differ in two significant ways.

Unlike Perception, Rogō’s reporting is entirely assessment-based – this means that it is not possible to track performance of questions across assessments, except by hand.

Rogō does maintain previous versions of questions and, although these cannot currently be reported as we would like, the potential is certainly there.

Driver 5: FlexibilityPerception does provide a very extensive suite of tools/functionality to allow users to customize look and feel, to integrate with third-party systems such as VLEs and to allow developers to interrogate and manipulate the Perception database. Rogō, in contrast, because it has been primarily developed for, and used by a single institution has required less of this sort of functionality before now and is therefore less flexible, in this respect, than Perception.

When looked at from the point of view of a user looking to move from Perception, there are two significant examples of this problem: the importance of paper type and question locking.

The importance of paper type

In both Perception and Rogō, paper type is used to change the default settings that are applied to an assessment, such as whether feedback is shown. However, the degree to which users can then alter these settings is currently very much less flexible in Rogō than in Perception. In our pilot assessment, we had to temporarily change the underlying code to allow students to see feedback on a ‘Progress Test’ (which was used rather than a ‘Self-Assessment Test’ to allow restarting in the event of a disaster).

Question locking

Once the scheduled assessment start date/time has passed in Rogō, any questions in that assessment, and the assessment itself, are locked. This quite reasonably prevents any changes to ‘delivered’ questions. However, it means that even minor edits (e.g a spelling mistake) cannot be made without copying and reinserting the question, thus losing any link with the original question. This also means that any change to a question would remove any possible ability to track performance across assessments, except by hand.

Other notable differences between Rogō and PerceptionWhat participants see

Although the questions themselves are presented in essentially the same way, there are a number of striking differences in participant experience between the two systems. Currently, the majority of MSD assessments are delivered as a single scrollable page of questions, display a countdown timer, and are automatically submitted when an individual’s time has elapsed. Rogō lacks the latter features as it has no concept of timing. This, in turn, means that the only way to achieve the safety provided by Perception’s Save As You Go feature is to split the questions into separate pages with answers being saved as screens change.

Figure 2: Rogo question creation

4

We asked students for their feedback on their experience with the Rogō-delivered assessment. The most common response was in relation to the absence of a timer. There was an even split of opinions among students with regard to displaying the questions on multiple pages as opposed to all on a single page

One additional feature of Rogō that was popular with students, is the ability to click on multiple choice options to strike them through. Students liked being able to visually eliminate answers they felt incorrect.

Reliability

Reliability is a key concern in high-stakes online assessment (Gilbert et al., 2009). We gained permission to run summative online assessments on the basis of putting in place a number of contingency plans in case of hardware or software failures, including having a paper version of the exam ready in case of complete failure. In Perception, we have created a template to produce this automatically but this facility is available ‘out of the box’ in Rogō. The requirement for a paper-based contingency obviously has implications for question types used in formal exams, such as those including audio and/or video.

In online assessment, it only takes one student to be adversely affected by a hardware, software or human failure to seriously undermine confidence in the whole system. So, our Perception system was designed to be resilient and to be able to cope with things like machine failures (SAYG). We will need to put in place reliability measures of similar or greater strength before attempting summative assessment with Rogō.

Security

Like reliability, security is of paramount importance in summative assessment. Perception comes with its own secure browser, QuestionMark Secure, that, among other things, prevents users from accessing any other programs or websites during an assessment. Rogō lacks this feature, but there are a number of third party secure browsers available that can provide the same functionality, such as Safe Exam Browser6. A similar ‘locked-down’ state can also be achieved by Windows Group Policies.

ConclusionRogō addresses the hardware cost, performance and flexibility drivers identified in the introduction. On authoring, delivery and reporting Rogō is currently no better suited to our needs than Perception and, in some areas, less well suited.

Our first steps with Rogō have been encouraging. The potential to finally be able to deliver and report on assessments as we would like to, if we, and the Rogō community can deliver the functionality, is very exciting. However, although our assessment needs can probably be generalised, the extent to which other departments/institutions can/are prepared to contribute to the code may vary and this may impact on whether they feel Rogō is as appealing to them, despite the other benefits that OSS can provide.

However, these are early days for us and we will need to build our confidence with Rogō, assuring ourselves that it does meet our reliability and security needs in particular, before we could consider moving our summative assessment from Perception to Rogō.

References

Apampa, K.M, Wills, G.B and Argles, D (2009) Towards Security Goals in Summative E-Assessment Security. At ICITST-2009, London, UK, 09 - 12 Nov 2009.

Case, S. M. & Swanson, D. B. (1993). Extended-matching items: a practical alternative to free response questions. Teaching and Learning in Medicine, 5, 107-115.

Gilbert, L., Gale, V., Warburton, B. & Wills, G. (2009). Report on Summative E-Assessment Quality (REAQ), JISC.

Sieber, V & Young, D. (2008). Factors associated with the successful introduction of on-line diagnostic, formative and summative assessment in the Medical Sciences Division University of Oxford. In Khandia, F. (Ed.), CAA 2008 International Conference, University of Loughborough, http://caaconference.com.

6 Safe Exam Browser, http://www.safeexambrowser.org/ (accessed 5th May 2012)

5

A Report on Online Assessment - University of OxfordQuestionMark Perception1 is a commercial...

Documents

Transcript of A Report on Online Assessment - University of OxfordQuestionMark Perception1 is a commercial...