Automated approaches to characterizing educational digital library usage: linking computational...

14
Int J Digit Libr (2012) 13:51–64 DOI 10.1007/s00799-012-0096-x Automated approaches to characterizing educational digital library usage: linking computational methods with qualitative analyses Keith E. Maull · Manuel Gerardo Saldivar · Tamara Sumner Received: 28 November 2011 / Revised: 27 June 2012 / Accepted: 3 July 2012 / Published online: 29 July 2012 © Springer-Verlag 2012 Abstract The need for automatic methods capable of characterizing adoption and use has grown in operational dig- ital libraries. This paper describes a computational method for producing two, inter-related, user typologies based on use diffusion. Furthermore, a case study is described that demon- strates the utility and applicability of the method: it is used to understand how middle and high school science teach- ers participating in an academic year-long field trial adopted and integrated digital library resources into their instructional planning and teaching. Use diffusion theory views technol- ogy adoption as a process that can lead to widely different patterns of use across a given population of potential users; these models use measures of frequency and variety to char- acterize and describe such usage patterns. By using computa- tional techniques such as clickstream entropy and clustering, the method produces both coarse- and fine-grained user ty- pologies. As a part of improving the initial coarse-grain typol- ogy, clickstream entropy improvements are described that aim at better separation of users. In addition, a fine-grained user typology is described that identifies five different types of teacher-users, including “interactive resource specialists” and “community seeker specialists.” This typology was val- idated through comparison with qualitative and quantitative data collected using traditional educational field research methods. Results indicate that qualitative analyses corre- late with the computational results, suggesting automatic K. E. Maull (B ) · M. G. Saldivar · T. Sumner Department of Computer Science, Institute of Cognitive Science, University of Colorado, Boulder, CO 80309, USA e-mail: [email protected] M. G. Saldivar e-mail: [email protected] T. Sumner e-mail: [email protected] methods may prove an important tool in discovering valid usage characteristics and user types. Keywords Technology adoption · Diffusion of innovation · Use diffusion models · Educational digital libraries 1 Introduction Educational digital libraries have evolved from being primar- ily research-oriented enterprises to encompass a large num- ber of operational library sites, such as NSDL, Merlot, and DLESE in the US, SchoolNet in Europe, and the National Digital Learning Resources Network in Australia, to name just a few. As library efforts continue to mature, there is a growing need for efficient and scalable methods to char- acterize their uptake and adoption, their impact on teacher and student practices, and ultimately their impact on stu- dent learning. In this article, we describe a computational method for automatically identifying and characterizing dif- ferent patterns of digital library adoption and use. This method instantiates a particular theoretical model of technology adoption, use diffusion [31], which in turn builds on prior work on the diffusion of innovation [27]. Diffu- sion of innovation is one of the most researched and widely employed social science models; it has been used to study the adoption of agriculture innovations such as new corn varie- ties [27], health innovations such as water purification and disease treatments [27], and the very rapid adoption of digital consumer products [24]. Diffusion of innovation theory pro- vides a lens for understanding the different factors that influ- ence a person’s decision to use, or not use, an innovation, and when in the product lifecycle they might adopt. For instance, an “early adopter” farmer might be motivated by the thought of potential harvest gains to be the first farmer in the region to 123

Transcript of Automated approaches to characterizing educational digital library usage: linking computational...

  • Int J Digit Libr (2012) 13:5164DOI 10.1007/s00799-012-0096-x

    Automated approaches to characterizing educational digitallibrary usage: linking computational methods with qualitativeanalyses

    Keith E. Maull Manuel Gerardo Saldivar Tamara Sumner

    Received: 28 November 2011 / Revised: 27 June 2012 / Accepted: 3 July 2012 / Published online: 29 July 2012 Springer-Verlag 2012

    Abstract The need for automatic methods capable ofcharacterizing adoption and use has grown in operational dig-ital libraries. This paper describes a computational methodfor producing two, inter-related, user typologies based on usediffusion. Furthermore, a case study is described that demon-strates the utility and applicability of the method: it is usedto understand how middle and high school science teach-ers participating in an academic year-long field trial adoptedand integrated digital library resources into their instructionalplanning and teaching. Use diffusion theory views technol-ogy adoption as a process that can lead to widely differentpatterns of use across a given population of potential users;these models use measures of frequency and variety to char-acterize and describe such usage patterns. By using computa-tional techniques such as clickstream entropy and clustering,the method produces both coarse- and fine-grained user ty-pologies. As a part of improving the initial coarse-grain typol-ogy, clickstream entropy improvements are described thataim at better separation of users. In addition, a fine-graineduser typology is described that identifies five different typesof teacher-users, including interactive resource specialistsand community seeker specialists. This typology was val-idated through comparison with qualitative and quantitativedata collected using traditional educational field researchmethods. Results indicate that qualitative analyses corre-late with the computational results, suggesting automatic

    K. E. Maull (B) M. G. Saldivar T. SumnerDepartment of Computer Science, Institute of Cognitive Science,University of Colorado, Boulder, CO 80309, USAe-mail: [email protected]

    M. G. Saldivare-mail: [email protected]

    T. Sumnere-mail: [email protected]

    methods may prove an important tool in discovering validusage characteristics and user types.

    Keywords Technology adoption Diffusion of innovation Use diffusion models Educational digital libraries

    1 Introduction

    Educational digital libraries have evolved from being primar-ily research-oriented enterprises to encompass a large num-ber of operational library sites, such as NSDL, Merlot, andDLESE in the US, SchoolNet in Europe, and the NationalDigital Learning Resources Network in Australia, to namejust a few. As library efforts continue to mature, there isa growing need for efficient and scalable methods to char-acterize their uptake and adoption, their impact on teacherand student practices, and ultimately their impact on stu-dent learning. In this article, we describe a computationalmethod for automatically identifying and characterizing dif-ferent patterns of digital library adoption and use.

    This method instantiates a particular theoretical model oftechnology adoption, use diffusion [31], which in turn buildson prior work on the diffusion of innovation [27]. Diffu-sion of innovation is one of the most researched and widelyemployed social science models; it has been used to study theadoption of agriculture innovations such as new corn varie-ties [27], health innovations such as water purification anddisease treatments [27], and the very rapid adoption of digitalconsumer products [24]. Diffusion of innovation theory pro-vides a lens for understanding the different factors that influ-ence a persons decision to use, or not use, an innovation, andwhen in the product lifecycle they might adopt. For instance,an early adopter farmer might be motivated by the thoughtof potential harvest gains to be the first farmer in the region to

    123

  • 52 K. E. Maull et al.

    try out a new corn variety, whereas a late majority farmerwould wait until most of the surrounding farms had alreadyadopted the new corn. Characterizing the adoption of con-temporary information services, such as educational digitallibraries, however, is more complex than identifying whenthe farmer planted the corn or when the consumer boughtthe digital device. Did a teacher that used NSDL or MER-LOT one time adopt the library? In acknowledgement ofthis complexity, instead of focusing on when an innovation isfirst used, use diffusion examines both how and how much aninnovation is used to identify different adopter categories. Itrecognizes that both the depth and breadth of usage will varywidely across different users, and that successful adoptionwill take many different forms.

    Building on techniques from web analytics and datamining, the proposed computational method employs twodifferent algorithms, in a two-step process, to develop bothcoarse- and fine-grained views of user behavior. These algo-rithms rely on detailed web site usage logs, where each indi-vidual action in the interface is recorded and associated with aunique user identifier. In the first step, one algorithm uses fre-quency of use and variety of use to sort users into different usediffusion adopter quadrants, such as intense use, limiteduse, specialized use, etc. One challenge with operational-izing use diffusion in a computational method is modelingvariety in a way that is application independent. In Maull et al.[21], variety was calculated using Shannon [30] entropy, amathematical construct from information theory. While thefirst attempt using entropy resulted in a valuable step forwardin understanding how to map web usage onto the use dif-fusion model, it did not sufficiently characterize specializeduse. In this study, we extend the initial calculations to includean enhanced entropy computation that includes a penalty forlow variety. The details of these new computations and theirperformance are described in detail later in this article. In thesecond step, a clustering algorithm is employed to develop afiner-grained understanding of the different patterns of gener-alized and specialized use within and across these quadrants.The result of this step is a user typology or classification ofusers along the selected dimensions of each cluster.

    To illustrate the utility of the computational method, theresulting user typology, and how they might be applied inpractice, we present a case study where we used this methodto better understand how middle and high school scienceteachers integrated digital library resources from NSDL andDLESE into their instructional planning and teaching prac-tice. Teachers were provided with a web-based planning toolthat enabled them to customize their districts adopted cur-riculum with digital library resources to better meet diverselearner needs and to share their customizations with otherteachers in their district [34]. This planning tool, the Cur-riculum Customization Service (CCS), was deployed to allmiddle and high school Earth science teachers within a large

    urban school district in the Midwest US for a full academicyear; the Service was carefully instrumented to record everyuser action and detailed usage logs were collected. This caseoffers an excellent testbed for this method for two reasons.First, the total potential user population is known and quan-tifiable, enabling us to more easily assess overall rates ofuptake and adoption. Second, this deployment was also stud-ied using traditional educational field research methods suchas surveys, interviews, and classroom observations. Thus,we can compare the coarse- and fine-grained views of userbehavior identified by the computational method with find-ings from the field study to better assess the accuracy andvalidity of the methods output. While this case is clearlyfocused on understanding educational use of digital libraries,we believe the proposed method to be generalizable andapplicable to a wide variety of digital libraries, informationservices, and learning applications.

    2 Background and related work

    This research draws on theories and computational tech-niques from several disciplines in order to better understanddigital library adoption use. First, as previously described, wedraw on technology adoption and diffusion theories, whichare historically rooted in social science, to inform the pur-pose and overall functioning of our computational method.Second, we discuss related research to understand userbehavior and user typologies, describing how our approachcompares to other efforts that are similarly focused on devel-oping automatic methods.

    2.1 Adoption and diffusion theories

    Technology adoption occurs when an individual decides thata given technological innovation has utility and can add valueto his or her activitiessuch as teachingif that innova-tion is somehow incorporated into those activities [33]. Thus,much theoretical work to date has focused on understandingthe cognitive, affective, and contextual factors that influencea potential users decision-making process. One prominentfamily of theories offers extensions or refinements of Rogersinnovation diffusion theory [7,25,27]. To Rogers, technol-ogy adoption is fundamentally a function of the communica-tion channels and social systems of which one is a part. Thistheory suggests that (1) within a social system, there are typi-cally five different adopter categories describing the differ-ent characteristics that users bring to bear when consideringwhether to adopt an innovation and (2) these characteristicsinfluence how innovations move through the social systemin predictable ways. Another model is the concerns-basedadoption model developed by Fuller [12] and Hall [13]. As itsname implies, this model is focused on individuals concerns,

    123

  • Linking computational methods with qualitative analyses 53

    which are defined as the specific reasonssituated in onessocio-cultural contextthat one might have to adopt or notadopt technology. A third major approach is the technologyacceptance model, or TAM [6,36]. This was one of the firstmodels to take into account the individuals self-efficacy andexpertise. Proponents of this model argue that individualsself-perception of their ability to use technology and theirability to judge whether a technology has utility for themare important factors for understanding technology adoptionbehaviors.

    These prior models have contributed greatly to our under-standing of technology adoption; however, they all share acouple of weaknesses. First, none of them take into accountdiscontinuance; i.e., people often stop using a new technol-ogy after they have tried it out a few times. Second, theyprovide very little insight into actual use of the new technol-ogy. Since most of these methods rely on self-reported sur-vey data, they more often predict variance in self-reporteduse rather than actual use [35]. Thus, more recently, theo-retical attention has shifted from focusing on the decisioncomponent of adoption towards understanding adoption as aprocess that can lead to different patterns of use. Models thataccount for use fall into the category of use diffusion mod-els. These models attempt to characterize the way, and thedegree to which, people make use of the new technology. Forexample, once a consumer purchases a cell phone, to whatdegree does he or she actually use the phone? What featuresare used and how are these features used? Can different typesof usage be recognized and compared?

    Formally proposed by Ram and Jung [26] and subse-quently updated and expanded by Shih and Venkatesh [31]to accommodate more robust and predictable descriptions ofusage, use diffusion extends the traditional notion of adop-tion diffusion by focusing on system usage patterns. Thework in this paper builds on the Shih and Venkatesh use dif-fusion model, which suggests two dimensions to patterns oftechnology use. The first dimension, frequency, provides ameasurement of how much a technology is used. In the webcontext, for example, frequency of web site use might bedefined as the number of sessions a user has generated oversome period of time. This frequency measure can be a veryuseful indicator of their interest in site content. The seconddimension of the use diffusion model is what the authors callvariety. This dimension measures the range of use of a tech-nology; did the consumer make use of most of the featuresof their new cell phone, or only two or three? In the webcontext, unlike frequency and number of sessions, there areno standard measures of variety. For this study, we modelvariety as clickstream entropy. The use diffusion model thusproduces four categories of use of a technology as shown inFig. 1. When plotted along these two dimensions, the pop-ulation of users is thus segmented into these four adoptercategories: intense use, limited use, specialized use, and

    Fig. 1 Use diffusion model proposed by Shih and Venkatesh [31]

    non-specialized use. A user may move to and from adoptercategories depending, for example, on the time interval con-sidered and the granularity of the data used for analysis (e.g.,an individual session vs. lifetime sessions).

    2.2 User typology modeling

    In general, typologies aim at classifying or categoriz-ing common characteristics of objects. They are widelyused in sociological, biological, linguistic, and psycholog-ical classifications to help both researchers and practitio-ners communicate efficiently about phenomenon of interest.For example, in sociological typologies specific traits andfeatures of social groups are segmented (often times hier-archically) into meaningful categories. Typological determi-nations are often made by careful examining of patterns inresearch data and do not always gain widespread signifi-cance or acceptance until broad examination within a scien-tific community. As massive datasets of interest are rapidlybecome available, however, typological determinations aremore often being made automatically, computationally, andexperimentally, as in this research.

    A great deal of recent typology research has been directedtowards understanding and classifying Internet users, thecommon tasks they perform, and the details of their onlinebehaviors. The Pew Research Group, for example, is devel-oping user typologies describing the technology and Internetuse patterns of Americans [16]. In their typology, distinc-tions are made, for example, between Light But Satisfiedusers, individuals who use some technology but for whomtechnology does not play a central role in day-to-day life,and Omnivores, who embrace technology fully and partic-ipate heavily in online activities. Typologies of media usershave been extensively explored by Brandtzg [3], and withthe rise of social media systems, user typologies within thiscontext are a growing area of research [2]. In educationresearch, recent work by Eynon and Malmberg [11] illus-trates how typologies can inform design and implementationrecommendations: they are using Internet usage typologiesof young students to more effectively integrate new technol-ogies into classroom practices.

    123

  • 54 K. E. Maull et al.

    Typologies within digital libraries and other repositorieshave been used to explore the background, behavior and moti-vations of end users to help inform decision making aboutlibrary user interfaces, possible design or service enhance-ments, or digital resource development. For example, per-sonas, a type of user typology, have been successfully usedto understand the characteristics and needs of institutionalrepository users [20] and the needs of users of library servicessupporting scientific data curation [18]. Typically, personasare created using a labor- and expertise-intensive qualita-tive research method that relies on extensive interviews andobservations of users. The result of this qualitative researchis a typology of users that describes their needs, motivations,and contextual factors that might influence system adoptionand use. Automatic computational methods for creating dig-ital library personas have been explored by Maness et al. [20]and further by Miaskiewicz et al. [23].

    Within educational data mining, research using computa-tional methods for identifying user typologies often use clus-tering algorithms to group students into different categoriesbased on skills sets [1] or performance on a test or assess-ment [9]. Xu et al. [39] use clustering to identify and classifyusage types of teachers. They examine features of teacher-generated projects within the Instructional Architect tool tocreate a typology of users based on the kinds of projects thatthey produced. As these examples illustrate, clustering algo-rithms are generally used to assign group membership amongitems with common attributes or features in large datasets.The computational method presented in this article also usesclustering to identify and classify usage types of teachers.This research differs from prior efforts in that the featuresselected for the clustering algorithm are theoretically moti-vated by the use diffusion model.

    Thus, the outputs of our computational method are two,inter-related, user typologies: (1) a course-grained view ofthe user population segmented into use diffusion adoptercategories and (2) a fine-grained view of the same popu-lation segmented along the same two dimensions but usingmore detailed measures for variety and frequency. Classi-fying and categorizing users into groups is a common taskin user behavior analysis. The output of adoption models isoften a set of adopter categories, which are a particular typeof user typology. The categories produced by the use dif-fusion model are analogous to those produced by Rogersdiffusion of innovation model (i.e., early adopter or latemajority).

    3 Computational method

    The two-step method for this research is constructed to dis-cover usage patterns and user typologies. The first step cap-tures coarse-grained user categories, while the second step

    Fig. 2 Step 1 overview

    determines fine-grained typologies of system use. The nexttwo sections will examine the details of each step, theirinputs, processes, and outputs.

    3.1 Step 1: Use diffusion patterns (Fig. 2)

    To understand how use diffusion patterns are modeled, it isimportant to more fully examine the frequency and varietydimensions of the model. In this study, frequency is mod-eled as the number of user-initiated web sessions. Whileother frequency measures can be considered, this measureprovides a good initial approximation of overall system use:fewer sessions imply lower system use, while more sessionsimply higher system use. Variety, on the other hand, is morechallenging to model because it is difficult to develop appli-cation-independent approaches to the concept. For the firststep, we chose a variety metric that is based on aggregateuser clickstreams. Intuitively, the clickstream of a partic-ularly user approximates their broad usage of the system.Furthermore, over time clickstreams become regularthatis they become more predictable as users develop normalpatterns of use within the system. By applying entropyandspecifically Shannon entropy [30]over the lifetime click-stream of each user, a basic notion of variety is developedthat gives an approximate measurement of user behavior.Entropy has been used extensively in many systems to calcu-late measures of randomness and to approximate the amountof information being communicated in a system.

    In Maull et al. [21], initial entropy computations weremade based on a simple, unmodified Shannon entropy calcu-lation. To summarize the first computations from that work,a clickstream is modeled in a robust, domain independent,computationally trivial way using entropy as a model for vari-ety. Since users generate a path of click interactions through asite, applying entropy models allow for a coarse-grain mea-surement of the predictability of clicks within a site. Thisintuition is then extended to the notion of variety: highlypredictable, low-entropy models imply low variety, whereashighly unpredictable, high-entropy models imply high vari-ety. The result of this coarse-grained step is a projection offrequency and variety patterns onto use diffusion quadrantswhere each user binned into a quadrant.

    While these calculations yielded interesting overallresults, they were not reliable predictors of specialized use.

    123

  • Linking computational methods with qualitative analyses 55

    To experiment with an entropy-based model further, an exten-sion to the entropy calculation was developed. One of theprimary disadvantages of the entropy calculation used pre-viously is that there is little difference between breath anddepth of the URLs and clicks used in the calculation. Forexample, a large variety of URLs and clicks that broadlyexplore a site is no different from a large variety that deeplyexplore a site. This is actually not surprising, but the concernit presents in terms of mapping entropy calculations onto thevariety model of use diffusion is that there are too few usersin the high variety, low-frequency quadrant. This is particu-larly pronounced since variety and time used in the systemtend to also go togetherthat is the longer one uses a system,the more likely the increase in variety of use of the system.

    To approach this problem experimentally, we revise theentropy calculations by bifurcating clicks into two catego-ries. Since our objective is to improve the number of usersin the lower quadrant representing specialized use, we wantto account for two kinds of variety: entropy of the clicks thatare related to the interactive resources components of thesystem (e.g., interactive resources), and then the entropy ofclicks related to the use of publisher materials (e.g., PDFs).Without getting lost within the specifics nuances of the appli-cation under examination, such bifurcation should be possi-ble generically, without affecting the goal of our initial step ofgeneralizing variety so that it is not overly specific to any oneparticular application. It may also be possible to extend thismethod beyond two partitions to n partitions, though suchextensions are beyond the scope of this paper.

    Let us consider the set of clicks Ca such that Ca ={a1, . . . ,an} where a1, . . . , an are clicks in the first category of appli-cation clicks and Cb are the set of clicks Cb = {b1, . . . , bn}where b1, . . . , bn are the set if clicks in the second categoryof the application. For generalization, it will be left to thespecification of the application to determine how these cat-egories are determined. In the case of our application, wechose to split the application along the interactive resourcesand publisher components of the system.

    The new entropy calculation now considers the balanceof clicks within each category of the application, so that thisbalance B for a user ui is computed by

    Bui = 1 |Cuia Cuib |

    |Cuia + Cuib |.

    This new balance calculation thus penalizes large imbal-ances and allows for more balanced ratios to go largely unaf-fected. Recall the entropy calculation H from [21]

    H(Suik ) = n

    i=1pki log2 pki ,

    where uik is the clickstream k for a user ui . Our new entropycalculation HB is given by

    Fig. 3 Updated and re-calculated use diffusion pattern showing newfrequency and variety calculations

    Fig. 4 Step 2 overview

    HB(Suik ) = H(Suik ) + log2(Bui ).

    Figure 3 shows the results of the new calculations. Plot-ted against the means of the original data of Fig. 6, there arenow more users in the specialized category, indicating thatthe penalty calculation did indeed improve the separation ofusers. While this new calculation is not nearly as specific asthe clustering done in the next step, it is a step toward re-bal-ancing the diffusion pattern so that it more robustly accountsfor the breadth and depth of use discussed previously.

    3.2 Step 2: User typology modeling (Fig. 4)

    While use diffusion patterns provide domain-independentquadrants of generalized usage behavior, to understand fine-grained user behavior we apply data mining algorithms, spe-cifically clustering. Having discussed the challenges of thevariety variable in step 1 above, in the second step featureswere selected from the clickstream data that expand varietyand frequency in order to discover more detailed views ofuser behavior. Since clickstream data provides a good metricfor computing variety through entropy, by selecting featuresthat model variety in more detail, such as the usage of spe-cific system components, we will develop a higher fidelityview of user behavior. These new refinements and the appli-cation of clustering expands the large grained use diffusion

    123

  • 56 K. E. Maull et al.

    Fig. 5 The CCS offers four major capabilities: access to publishermaterial (IES Investigations tab), access to digital library resources(Interactive Resources tab), personalization capabilities (My Stuff tab)

    and community features (Shared Stuff tab). The Interactive Resourcescomponent is opened above showing the top recommended digitalresources

    patterns into a fine-grained user typology that continues tomodel frequency and variety.

    The first aspect of refinement requires that we expand fre-quency and variety by choosing more detailed features tomodel those variables. In the case of this study, frequency isexpanded to include both sessions and total time spent withinthe system. Variety is expanded to include eight featuresthat capture varying usage across system components thatinclude publisher materials, digital library components, anduser-contributed/social features of the system. While thereare a number of clustering algorithms to choose for this sec-ond step, these experiments are based on the widely usedmodel-based expectation maximization (EM) algorithm [8].EM works by iteratively examining the parameters of eachobject instance to be clustered and builds a probability distri-bution that best explains where each object instance shouldbelong. After many iterations, the model settles into a set ofclusters that are represented by the model parameters derivedfor each cluster. The resulting output are clusters compris-ing a fine-grained user typology. The EM algorithm is usedin these experiments because it is fast, robust, and typicallyconverges quickly. Furthermore, cluster shapes (e.g., circles,ellipsoids) may vary to include more flexible cluster mem-bership.

    4 Case study

    For the remainder of this paper, we will focus on the casestudy that examines the use of the CCS. The CCS is a NationalScience Foundation funded program overseen by Digital

    Fig. 6 Use diffusion pattern comparison for step 1 from [21] and thenew entropy with penalty calculation showing frequency and variety ofthe data source

    Learning Sciences (DLS)a joint institute of the Universityof Colorado at Boulder and the University Corporation forAtmospheric Research. DLS began development of the CCSin early 2008 and in July 2009, the CCS was made availableto all Earth science teachers in a large urban school districtin the midwest. Over 100 teachers were trained on the CCSfor use in the 20092010 school year.

    The CCS provides four major features to the end user (seeFig. 5). First, it provides users with Web-based access to dig-ital versions of the paper-based student textbooks, teachermanuals, and curriculum guides that comprise the Earth

    123

  • Linking computational methods with qualitative analyses 57

    science curricula for both Grade 6 and Grade 9. The manu-als and guides outline the state standards that must be met,explain how the various units in the Earth science curric-ula are connected to state standards, and provide additionalsupplementary materials for teacher use, such as activities,teaching tips, and student assessments. These materials areall grouped under a single user interface component and areorganized by key concept, which allows teachers to organizetheir lessons in a manner that flexibly meets the learningneeds of their students. The digital versions of these curric-ular materials are identical to what was already available toteachers in paper form but can now be accessed from anycomputer with a Web connection.

    Second, the CCS integrates digitized publisher contentwith interactive resources available from the Digital Libraryfor Earth System Education (DLESE), a collection of Earthscience related digital resources that are part of the NationalScience Digital Library. By clicking on the InteractiveResources tab, a user can see recommendations for ani-mations, video clips, classroom activities, and other digitalresources that pertain to the given key concept. The inter-active resources available via the CCS have been vetted bythe experts who manage the DLESE collection. Moreover,these resources are filtered by the system to ensure that theyalign with the Earth science curricula. Thus, when a teacheraccesses a DLESE resource on, for example, volcanoes, theresource not only has been determined to have educationalvalue by a subject matter expert but it has also been tied tothe specific science concepts that must be taught as well asthe science standards that must be met.

    The third major feature of the CCS is an interactive Web2.0 capability, whereby teachers can save digital resourcesrecommended to them via the Interactive Resources com-ponent or they can upload their own resources to an areacalled My Stuff, thus storing teacher-developed materials inthe same space as interactive resources from DLESE or fromthe curriculum for easy access. Once a resource is saved toMy Stuff, teachers have the option to share a copy of theresource to an area of the CCS called Shared Stuff, which isaccessible to any CCS user who clicks on the Shared Stuffcomponent associated with a given key concept.

    The final major feature enables teachers to share materialswith their peers. When a digital resource is added to SharedStuff, the teacher who originally uploaded the resource, aswell as other CCS users, can add searchable tags or key-words so that any search of the CCS system for those tagswill list all resources tagged with the same keyword or phrase.Finally, CCS users can add star ratings to resources so thatother users can determine how their colleagues rate a givenresource. A resource that many users rate highlyfour orfive starsmight be more likely to capture the attention ofsystem users than a resource with a low rating. Hew andHara [15] argue that this kind of sharing may be a catalyst

    Table 1 Summary statistics for the data plotted in Fig. 6, N = 98Entropy Lifetime sessions

    Min.: 48063 Min.: 1.001st Qu.: 3.559 1st Qu.: 7.00Median: 4.834 Median: 23.00Mean: 4.535 Mean: 37.853rd Qu.: 5.830 3rd Qu.: 48.00Max.: 7.336 Max.: 171.00Std. Dev.: 1.56 Std. Dev.: 42.201

    for enabling improvement of practice, because such knowl-edge sharing tends to be tied to shared, situated instructionalgoals and challenges and is thus more likely to be relevantto a teachers immediate, short-term needs. A more detaileddescription of the CCS, including results from the field trial,can be found in Sumner et al. [34].

    4.1 Data source and step 1

    The data source for this study was clickstream data of 98users from 9 months of interaction within the CCS. The datafor these experiments were particularly interesting becausethe user interface environment contains many dynamic cli-ent-side components which would not ordinarily be capturedin a traditional web server log. The system was thereforeinstrumented to extract additional information from clickson these dynamic client-side interface components. The CCSclickstream data log, therefore, includes detailed tracking ofuser activity that provides rich user interaction data. Since wewere able to capture this data over a relatively long period oftime, we were able to analyze actual user behavior as the usersworked with the system in a natural, unhindered manner. Fur-thermore, we were able to examine how publisher materials,which are already core to standard teacher practice, are com-plemented by digital library resources, giving us a broaderview of how teachers integrate these digital library resourcesinto their traditional practices.

    As a result of step 1, we obtained initial frequency andvariety computations. Table 1 shows the descriptive statisticsresulting from the first step of our experimental method. Thedata show that the mean entropy of our population is 4.48 andthe mean frequency, here measured as the lifetime sessionsthat a user logged, is 37.85.

    When we apply the use diffusion model to the data bymarking the quartiles at the means of each axis, we obtain ause diffusion pattern that shows that there is a large (n = 40)limited use group, compared to the previous entropy calcu-lations (n = 38). Similarly, there are fewer (n = 26) intenseusers (entropy > 4.54, frequency > 37.85)those usersthat exhibit, both a larger amount of variety and frequency.

    123

  • 58 K. E. Maull et al.

    Table 2 CCS clusterexperiment features anddescriptions

    # Feature label Description

    1 Sessions Total lifetime sessions2 Hours Total system hours3 IR Activity Total activity within interactive resources4 IR Saving Total interactive resource saving behavior5 Shared Stuff Activity Total user-contributed content activity6 Shared Stuff Saving Total user-contributed content saves7 My Stuff Activity Total My Stuff activity8 My Stuff Saving Total My Stuff saving behavior9 PublisherTeacher Materials Total activity within publisherteacher materials10 PublisherStudent Materials Total activity within publisherstudent materials

    This shows a 10 % decrease from the initial computation (n =30). The plot also shows that there are now 5 users in the spe-cialized category (n = 5), where there was only a single pointin the specialized category.

    The key improvement of the new calculations over the ini-tial calculation is that it spreads the data out in a significantway and this change can be seen both in the standard devi-ation (from 1.25 up to 1.56) and in the visual spread in thegraph. It is encouraging that there are now a larger penaltiesfor higher entropy, but there are yet generalizable changesthat could be made.

    As can be seen in Fig. 6, the value of the use diffusion pat-tern modeling is that it provides a comprehensive overviewof coarse-grained behavior within the system. The dotted linedivides the graph at the means of each axis, thus creating thefour use diffusion quadrants. We can quickly see the distri-bution of basic usage patterns within the system. At the sametime, however, the pattern does not produce enough informa-tion to determine the specific details of use. To do this, wemust turn to the second step of the method.

    4.2 Feature selection and step 2

    We began the second step by choosing 10 features of our datafor further analysis. The features for this step of the methodrepresent four major functions of the system: (1) use of dig-ital library-related system functions, (2) use of traditionalpublisher materials and related system functions, (3) systemfunctions that involve personalization, and (4) user-contrib-uted functions. Table 2 summarizes the system features usedin the second step of our proposed method.

    4.2.1 Digital library features

    The CCS is specifically designed with the goal of provid-ing access to high-quality, digital library resources. Analyz-ing usage of the embedded digital library within the CCSshould therefore provide useful data about typical teacher

    practices around digital resources. To capture this behavior,we selected features that detail the clickstream patterns ofthe embedded digital library resources within the system.These resources were presented in the user interface underfour sub-categories: Top Picks, Animations, Images/Visuals,and Inquiry with Data. Each category contained resourcesfrom DLESE that were either selected as highly relevant tothe subject materials and unit focus (Top Picks) or containedmetadata that were of the appropriate type, scope, and topic(Animations, etc.) The items presented under each of theseviews were derived directly from DLESE web services [37]and appropriately presented to the user. Clickstreams into thiscomponent are tracked with the IR Activity and IR Savingfeatures.

    4.2.2 Publisher materials features

    Publisher materials are included as a core component of thesystem functionality. The majority of the publishers itemsrepresent digitized versions of paper-based materials, whe-ther they be book chapters, supplemental materials such ashand-outs, assessments, etc. The features here are thereforeconvenient digital proxies for real-world paper-based ana-logs. Clickstreams into this component of the system andcorresponding sub-components were tracked and organizedwith the PublisherStudent Materials and PublisherTeacherMaterials features. The PublisherStudent Materials includepublisher materials like digital versions of the student text-book, while PublisherTeacher Materials include supple-mental publisher materials such as instructional supportmaterials.

    4.2.3 Personalization features

    The CCS provides functionality to allow teachers to per-sonalize the contents of their accounts, in particular, usersare provided with the ability to save digital materials thatthey find of interest. Once saved, items may be retrieved for

    123

  • Linking computational methods with qualitative analyses 59

    Table 3 Experimental clusterresults showing the parametersobtained for each feature andcluster, where N indicates thenumber of users in each cluster

    Feature Cluster

    1 2 3 4 5

    Sessions 4.934 47.096 119.551 26.131 83.897Hours 1.0125 26.702 52.586 8.427 24.984IR Activity 2.483 74.253 95.537 26.801 17.109IR Saving 0.321 4.703 26.039 1.218 0.649Shared Stuff Activity 3.382 25.753 81.159 17.558 82.174Shared Stuff Saving 0.356 4.314 18.942 1.877 3.701My Stuff Activity 0.123 4.864 19.116 1.106 0.860My Stuff Saving 0.741 18.623 11.492 0.388 4.913PublisherStudent Material 1.421 21.715 98.143 14.212 74.404PublisherTeacher Material 4.781 28.790 86.070 14.649 92.306N 31 10 8 35 14

    further review and may even be shared with others if desired.Saving is considered a personalization feature because it pro-vides direct control over items that may fit a specific need(either at the time of save or in the future). Furthermore,saving implies an interest in the saved item, and while thatinterest may only last for a short time, it nonetheless acts as amarker of personalization behavior. Personalization behaviorwas captured with the saving behavior of the system throughthe My Stuff Saving feature. For example, teachers are able tosave embedded digital library resources, such as animations,images, visuals, and top picks, for the units of study they maybe interested in. The My Stuff Activity feature represents thetotal activity performed within the My Stuff features of thesystem.

    4.2.4 Community behavior features

    There are features of the system that promote community-centric behaviors. For example, resources and other materialsthat users find interesting can be shared with the communityat large, in a kind of community pool of resources calledShared Stuff. The feature has many implications whenconsidering the nature of communities of practice of K-12educators, who are often encouraged to share materials, ped-agogical strategies, and best practices amongst their peers.The community behaviors are captured with the Shared StuffActivity and Shared Stuff Saving features.

    4.3 A user typology

    Our second step relies on feature clustering to develop a usertypology. There are many clustering algorithms to choosefrom, but for this set of typology experiments we chosethe model-based EM algorithm [8]. Elsewhere we describeother experiments using other clustering algorithms andparameters [22]. Using the Bayesian Information Criteria to

    determine the optimal number of clusters for a given dataset,it was determined there were five clusters to be discoveredin our experiments. The details of the clusters are shown inTable 3. It should be noted that the clusters presented hereshow representative members of each cluster and do not nec-essarily represent actual users. That is, for each cluster inthe table, the values present the ideal parameters that fit theobject instances that belong within the cluster.

    As can be seen in Table 3, cluster 1 characterizes the low-use pattern (low variety, low frequency) of the step 1 usediffusion pattern. The users in this cluster have producedvery few hours of total use within the system. Furthermore,they do not seem to be using the full range of system fea-tures. On the other hand, the experiments revealed an intenseuser cluster (cluster 3) that shows robust use of the system.Indeed, this intense user category seems to have used thesystem in fullexercising nearly every aspect of the systemand logging both large numbers of sessions and significanthours of use. Two specialized user groups emerged from thetypology. While they both show about the same number oftotal hours within the system, cluster 5, identifies users whospend a great deal of time with the community features andpublisher materials of the system, while cluster 2 shows userswho spend far more time within the interactive resources andembedded digital library component of the system. Theseclusters are valuable to understand because they suggest thatsome users find the embedded digital resources as impor-tant as others find the community features. Finally, cluster 4shows a group of users that had a session count that was abovethe median, but less than that of the intense and specializedusers. This cluster also exhibits broad use of most of the fea-tures, with slightly more use of interactive resources. Table 4summarizes each cluster, giving the key characteristics of thecluster and the diffusion pattern that the cluster belongs to.Furthermore, we have created fictitious typology labels toprovide an easy way to remember the cluster characteristics.

    123

  • 60 K. E. Maull et al.

    Table 4 An initial user typology derived from step 2, showing the diffusion patterns and characteristics of each usage type

    Cluster Diffusion pattern Typology Label Characteristics

    1 Limited Use Uninterested Non-Adopter Very low over all system use2 Specialized Use Interactive Resource Specialist Heavy use of interactive resources relative to other system features.

    Tends to access system weekly3 Intense Use Ardent Power User Heavy and robust overall use of all features. Tends to access system daily4 Non-Specialized Use Moderate Generalist Moderate overall system use. Shows slightly more use of interactive resources

    than other system features. Tends to access system several times monthly5 Specialized Use Community Seeker Specialist Makes heavy use of Shared Stuff features and Publisher materials relative to

    Interactive Resource Activity. Tends to use the system weekly

    4.4 Validation of results with field research findings

    It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysis;rather, it presents an opportunity to bridge different analyt-ical approaches. Although the clusters that emerged fromour computational approach are based on clickstream data,these clusters are not arbitrary aggregations of user behav-iors. They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS. Thus, the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use in the wild.

    The CCS field trial used a mixed-method research design[19,4,10]. We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 20092010 academic year. In addition, wecollected qualitative data from a large subset of teachers dur-ing the same period. See Saldivar [29] for a complete dis-cussion of the CCS field trial; here, we summarize our datasources, sample sizes, and analysis techniques:

    Survey 1 (n = 85): A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources), DPS Earth science curricular materi-als, and differentiated instruction. Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items. Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses. Qualitative items were analyzed via contentanalysis. (Subsequent surveys also collected both quanti-tative and qualitative data, and were analyzed in the samemanner).

    Survey 2 (n = 84): A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular.

    Survey 3 (n = 81): A post-survey administered at the endof the field trial to all Earth science teachers. This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changes,if any, in CCS users attitudes and behaviors during thecourse of the school year.

    Adoption interviews (n = 24): These semi-structuredtelephone interviews regarding teachers adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts), which were analyzed via content analysis.

    Classroom observation cycles (n=8): Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year, result-ing in an extensive database of qualitative data (inter-view transcripts and field notes), which were analyzed viacontent analysis.

    Analysis of these field trial data, conducted independentlyof the computational analysis of clickstream data we havedescribed above, suggested that teachers in our study fellonto a spectrum of system use behavior, from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users. The final two clusters fell on themoderate-to-heavy side of the spectrum. Rogers [27] the-ory of diffusion of innovation, discussed in our introductorysection, predicts that the earliest users of a new technology(innovators and early adopters) comprise approximately16 % of all users. Moore [24], who revised and extended Rog-ers work, further argues that a technology cannot becomemainstream within a given population until it is adoptedby at least half of all potential users. Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2, 3, and 5) making heavy use of thesystem. Even if we confine ourselves to a theoretical frame-work, such as Rogers, that focuses on quantifying when dif-ferent segments of a target population adopt an innovation,

    123

  • Linking computational methods with qualitative analyses 61

    our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population.

    It can be argued that identifying low, moderate, and intenseuser clusters is hardly a profound finding; after all, any setof system users can be divided into low and heavy usercategories if the major variable of interest is frequency of sys-tem use. Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used. Looking through the lens of usediffusion, the most salient finding from our computationalmethod is that two of the clusters cluster 2, the InteractiveResource Specialist, and cluster 5, the Community SeekerSpecialist, represent variety of use behaviors similar to userbehaviors identified by our field research. This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices.

    For example, Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active Resources/Embedded Digital Library component ofthe CCS. Data from the final survey show that many teachershighly valued interactive resources. In response to the state-ment Using interactive resources effectively is importantto my personal success as a teacher, about three-fourths ofrespondents agreed or strongly agreed. When asked Usinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroom,almost 90 % agreed or strongly agreed. Qualitative data pro-vide insight into these survey findings.

    In response to an interview question that asked him to dis-cuss why he used the CCS, teacher Corey told us: My teach-ing practices have always focused on student engagement. Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention.

    Many teachers reported that they accessed digital resour-ces to supplement textbook material. Overwhelmingly, themost popular kinds of digital resources teachers found viathe CCS were graphic representations (e.g., pictures, dia-grams, animations) of Earth science phenomena. In thefollowing example, note Carlies observations regarding stu-dents engagement with and understanding of key concepts:I think that since students are visual learners, theyre hands-on learners, theyre growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense, and often times, they [say] Wow That totallymade sense Can we see it again? I think they benefit fromit a lot, and theyre vocal about it. They love it.

    Lizs experience was similar to Carlies in that her stu-dents seemed excited simply by the presence of digitalresources, which led to increased engagement and deeper

    comprehension. Liz told us in an interview, I think, in gen-eral, my students are really interested in seeing any kind of[digital resource]. [Digital resources are] really engaging forstudents, because its different than what they normally see inthe class-room... When they walk in, theyre just super-excitedwhen they see the projectors set up. [They say] Oh, yes,we get to watch a video, or We get to see a PowerPoint.They come in automatically excited to learn.

    In contrast, Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system. Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district, but why wereShared Stuff materials popular? Survey data suggest that theability to see other teachers uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available. For instance, in the final survey, almost61 % of respondents agreed or strongly agreed with the state-ment The CCS has increased my awareness of other teach-ers practices. Approximately, half of respondents agreedor strongly agreed with the statement The CCS has helpedme become a more active member of the DPS professionallearning community. Qualitative data provide a context forthese survey responses.

    In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuff,teacher Sheila stated: Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom. In response to the samequestion, Norma commented: [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively.

    Henrietta told us in an interview: When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the students,thats what Im looking for, and so far, I have found somethings like that on [the CCS]. Later in the same interview,Henrietta added that she and the other Earth science teach-ers at her schoolall CCS usersoften discussed usefulresources that they had found using the CCS: We certainlyconfer on [resources we discover] [We tell each other] Thiswas great, [and] This is something you need to [use] whenyou get to this [certain] point in your book.

    5 Discussion

    Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns,

    123

  • 62 K. E. Maull et al.

    while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata. The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data, suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application. This is only a first step, however, at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system.

    The CCSs main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices. Thus, the system wasdesigned to produce a specific outcometeacher differenti-ation of instruction. Our field trial data validated the in-sys-tem behaviors characterized, for example, by the InteractiveResource Specialist and Community Seeker Specialist clus-ters. However, we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom. It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors. Ideally, then, the most completeunderstanding of a systems impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors, motivations, and characteristics.

    As previously mentioned in Sect. 2.2, existing literatureon digital library use offers some guidance. Researchers inthe field of humancomputer interaction have long used theconcept of personas to guide the development of informa-tion systems. Since personas are constructs devised by sys-tem developers to describe different types of system users,which most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the system,developers can begin to model and build systems that morespecifically address the needs of actual users. The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system. Our method describes in-system behaviors, butby extending our approach to incorporate a full set of out-of-system behaviors including, for example, demographicinformation, teaching behaviors, and student outcomesthatis, by developing personas of CCS userswe can not onlydevelop a richer understanding of the relationship betweensystem use and users practices in the wild, but also wecan extend the notion of personas beyond system design toactual system adoption and use over time.

    While we have obtained promising results from this study,there are several significant limitations. First, our method hasonly been applied to a single population. It is plausible thatthe results that we obtained are specific to the populationunder examination. A different typology may emerge as a

    result of larger data populations, for example, new special-ized user types might emerge. Second, in order to validatethat our results are robust and generalizable, it will be impor-tant to study the method with different applications and userpopulations. Third, while the computational method seems tosupport the analyses of qualitative data, there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system.

    As can be seen in the case study, both of the steps of themethod yielded interesting results. It appears, however, thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users. While thenew calculations do a better job of penalizing usage, thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more. Furthermore, work stillremains to develop a precise coarse-grained model that sepa-rates every user group better. The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietyand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables, as in the extended entropy calcu-lations and second step of the method.

    Applying automated typology discovery to inform, guide,and develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas. Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviors.This would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system), policy (out-of-system), and training (in-system and out-of-system)decisions.

    6 Broader impacts

    While this research has focused on demonstrating the valueof our method for the study of digital library adoption, webelieve it can help address other educational challenges, chiefamong which are teacher professional development, the cor-relation of teacher practice to student learning, and the eval-uation of teaching practices.

    Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis--vis technology [14]. Teachers often complain that evenwhen technology-related training is made available, they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills. By modeling

    123

  • Linking computational methods with qualitative analyses 63

    teachers system use behavior, one will be able to under-stand inter-user differences and target system training andprofessional development to users true needs. For example,CCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources. For teachers in the InteractiveResource Specialist cluster, such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS.

    While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [17,32],the relationship between teachers adoption of technologicalsystems and student achievement is still not well understood.As part of our larger study of the CCS, we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachers use ofthe CCS had on student outcomes. Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS; due to thelimitations of the data available to us, however, we cannotmake strong causal claims. We continue our efforts to under-stand what link, if any, exists between system use and studentoutcomes. In the long term, the ability to correlate system usebehaviors with students academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents; in turn, these behaviors could be taught to otherteachers.

    Evaluating teachers instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5]. For example, most evaluation systemsrely on administrators to observe teachers in the classroom, avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed. Further,variance between evaluators adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38].Alternatively, asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors. Neutral third party datasuch as the clickstream datato which we applied our computational methodcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teachers classroom andthe teachers activities when he or she was not being formallyevaluated. A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators,

    policymakers, and teachers themselves deeper insights intoinstructional practices; further, such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators.

    7 Conclusion

    By applying models of adoption and use diffusion along-side data mining techniques, we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior. The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers. The method showsconsiderable promise for extracting useful behavioral pat-terns in the wild; the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods. The proposed method, while requir-ing more data to further validate, provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use. When fully real-ized, this method also has the potential to be extended toother applications and areas, such as informing teacher pro-fessional development, understanding the impact of digitallibrary applications on student learning, and developing newapproaches to the evaluation of teacher practice.

    Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards #0734875 and #0840744 and by the University of Colorado atBoulder.

    References

    1. Ayers, E., Nugent, R., Dean, N.: Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData. In: Proceedings of the 1st International Conference on Edu-cational Data Mining, Montreal, Canada, pp. 210217 (2008)

    2. Benevenuto, F., Rodrigues, T., Cha, M., Almeida, V.: Characteriz-ing user behavior in online social networks. In: Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference, ACM, Chicago, Illinois, USA, pp. 4962 (2009). doi:10.1145/1644893.1644900

    3. Brandtzg, P.B.: Towards a unified Media-User typology(MUT): a meta-analysis and review of the research literature onmedia-user typologies. Comput. Hum. Behav. 26(5), 940956(2010). doi:10.1016/j.chb.2010.02.008. http://www.sciencedirect.com/science/article/B6VDC-4YJSW8D-1/%2/011453cc70c0a6bdc29d3fde3b8a9304

    4. Creswell, J.: Educational Research: Planning, Conducting, andEvaluating Quantitative and Qualitative Research. PearsonEducation, Upper Saddle Creek, NJ (2008)

    5. Danielson, C., McGreal, T.L.: Teacher Evaluation To EnhanceProfessional Practice. Association for Supervision and CurriculumDevelopment, Alexandria, VA, USA (2000)

    123

  • 64 K. E. Maull et al.

    6. Davis, F.D.: Perceived usefulness, perceived ease of use, anduser acceptance of information technology. MIS Quart 13(3),319340 (1989)

    7. Deffuant, G., Huet, S., Amblard, F.: An individual-based modelof innovation diffusion mixing social value and individual benefit1. Am. J. Sociol. 110(4), 10411069 (2005)

    8. Dempster, A.P., Laird, N.M., Rubin, D.B: Maximum likelihoodfrom incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B(Methodological) 39(1), 138 (1977)

    9. Dominguez, A.K., Yacef, K., Curran, J.R.: Data mining for gen-erating hints in a python tutor. In: Proceedings of the 3rd Inter-national Conference on Educational Data Mining, Pittsburgh, PA,pp. 91100 (2010)

    10. Dzurec, L., Abraham, I.: The nature of inquiry: linking quantitativeand qualitative research. Adv. Nurs. Sci. 16, 7379 (1993)

    11. Eynon, R., Malmberg, L.: A typology of young peoples inter-net use: implications for education. Comput. Educ. 56(3),585595 (2011). doi:10.1016/j.compedu.2010.09.020. http://www.sciencedirect.com/science/article/B6VCJ-517J24P-1/%2/5c5d02fb284c227d3a9ef1cc4f3e1bf6

    12. Fuller, F.F.: Concerns of teachers: a developmental conceptualiza-tion. Am. Educ. Res. J. 6(2), 207226 (1969)

    13. Hall, G.E.: The concerns-based approach to facilitating change.Educ. Horizons 57(4), 202208 (1979)

    14. Hanson, K., Carlson, B.: Effective Access: Teachers Use of DigitalResources in STEM Teaching. Gender, Diversity, and TechnologyInstitute. Education Development Center, Inc., Newton (2005)

    15. Hew, K.F., Hara, N.: Empirical study of motivators and barri-ers of teacher online knowledge sharing. Educ. Technol. Res.Dev. 55(6), 573595 (2007)

    16. Horrigan, J.: A typology of information and communication tech-nology users. Research report, Pew Internet & American Life Pro-ject (2007)

    17. Kelly, M.G., McAnear, A.: National Educational Technology Stan-dards for Teachers: Preparing Teachers to Use Technology. Interna-tional Society for Technology in Education (ISTE), Eugene (2002)

    18. Lage, K., Maness, J., Losoff, B.: Receptivity to library involve-ment in scientific data curation: a case study at the University ofColorado Boulder. Portal Libr. Acad. 11(4), 915937 (2011)

    19. Leech, N., Onwuegbuzie, A.: A typology of mixed methodsresearch designs. Qual. Quant. 43(2), 265275 (2009)

    20. Maness, J., Miaskiewicz, T., Sumner, T.: Using personas to under-stand the needs and goals of institutional repository users. D-LibMag. 14(9/10), 10829873 (2008)

    21. Maull, K., Saldivar, M., Sumner, T.: Understanding digital libraryadoption: a use diffusion approach. In: Proceeding of the 11thAnnual International ACM/IEEE Joint Conference on Digitallibraries, ACM, pp. 259268 (2011)

    22. Maull, K.E., Saldivar, M.G., Sumner, T.: Online curriculum plan-ning behavior of teachers. In: Proceedings of the 3rd Interna-tional Conference on Educational Data Mining, Pittsburgh, PA,pp. 121130 (2010)

    23. Miaskiewicz, T., Sumner, T., Kozar, K.: A latent semantic anal-ysis methodology for the identification and creation of personas.

    In: Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems, ACM, pp. 15011510(2008)

    24. Moore, G.A.: Crossing the Chasm: Marketing and SellingTechnology Products to Mainstream Customers. HarperCollins,New York (2006)

    25. Pennington, M.C.: Cycles of innovation in the adoption of infor-mation technology: a view for language teaching. Comput. Assist.Lang. Learn. 17(1), 733 (2004)

    26. Ram, S., Jung, H.: The conceptualization and measurementof product usage. J. Acad. Mark. Sci. 18(1), 6776 (1990).doi:10.1007/BF02729763. http://www.springerlink.com/content/9kjl7574145320mv/

    27. Rogers, E.M.: Diffusion of Innovations, 5th edn. The Free Press,New York (2003)

    28. Saldivar, M.: Teacher integration of digital resources into instruc-tional practice. CCS Report No. 4. Digital Learning Sciences,Boulder (2011)

    29. Saldivar, M.G.: Teacher adoption of a Web-based instructionalplanning system. Doctoral dissertation, University of Colorado.Boulder, CO (2012)

    30. Shannon, C.E.: A mathematical theory of communcation. Bell Syst.Tech. J. 27, 379423 (1948)

    31. Shih, C., Venkatesh, A.: Beyond adoption: development and appli-cation of a use-diffusion model. J. Mark. 68(1), 5972 (2004).http://www.jstor.org/stable/30161975

    32. Smerdon, B.: Teachers Tools for the 21st Century: A Report onTeachers Use of Technology. US Dept. of Education, Office ofEducational Research and Improvement, Washington, DC (2000)

    33. Straub, E.T.: Understanding technology adoption: theory andfuture directions for informal learning. Rev. Educ. Res. 79(2),625649 (2009)

    34. Sumner, T., Team, C.: Customizing science instruction with edu-cational digital libraries. In: Proceedings of the 10th Annual JointConference on Digital libraries, ACM JCDL 10, New York, NY,USA, pp. 353356 (2010). doi:10.1145/1816123.1816178

    35. Turner, M., Kitchenham, B., Brereton, P., Charters, S., Budgen,D.: Does the technology acceptance model predict actual use?A systematic literature review. Inform. Software Technol. 52(5),463479 (2010)

    36. Venkatraman, M.P.: The impact of innovativeness and innovationtype on adoption. J. Retail. 67(1), 5167 (1991)

    37. Weatherley, J.: A web service framework for embedding discov-ery services in distributed library interfaces. In: Proceedings of the5th ACM/IEEE-CS Joint Conference on Digital libraries (JCDL05), ACM, New York, NY, USA, pp. 4243 (2005). doi:10.1145/1065385.1065394

    38. Wilson, B., Wood, J.A.: Teacher evaluation: a national dilemma.J. Person. Eval. Educ. 10(1), 7582 (1996)

    39. Xu, B., Recker, M., Hsi, S.: Data deluge: opportunities for researchin educational digital libraries. In: Cassie M. Edwards (ed) InternetIssues: Blogging, the Digital Divide and Digital Libraries. NovaScience Pub Inc., New York (2010)

    123

    Automated approaches to characterizing educational digital library usage: linking computational methods with qualitative analysesAbstract1 Introduction2 Background and related work2.1 Adoption and diffusion theories2.2 User typology modeling

    3 Computational method3.1 Step 1: Use diffusion patterns (Fig.2)3.2 Step 2: User typology modeling (Fig.4)

    4 Case study4.1 Data source and step 14.2 Feature selection and step 24.2.1 Digital library features4.2.2 Publisher materials features4.2.3 Personalization features4.2.4 Community behavior features

    4.3 A user typology4.4 Validation of results with field research findings

    5 Discussion6 Broader impacts7 ConclusionAcknowledgmentsReferences