Copyright © 2002 Kaplan DeVries Inc.

30

Transcript of Copyright © 2002 Kaplan DeVries Inc.

Page 1: Copyright © 2002 Kaplan DeVries Inc.
Page 2: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 1

Copyright © 2002 Kaplan DeVries Inc.

All rights reserved.

'Leadership Versatility Index – User's Guide' may

not, in whole or in part, be copied, photocopied,

reproduced, translated, or converted to any

electronic or machine-readable form without

prior written consent of the publisher, Kaplan

DeVries Inc.

Leadership Versatility Index – User's Guide

Includes bibliographical references.

Authors: Kaiser, Robert B. and Kaplan, Robert E.

Published By:

Kaplan DeVries Inc.

1903-G Ashwood Ct.

Greensboro, NC 27455

Tel: (336) 288 8200

Fax: (336) 282 6878

E-mail: [email protected]

Website: www.kaplandevries.com

The Leadership Versatility Index™ 360-degree feedback

survey, report, and curvilinear rating scale are patent pending.

Page 3: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 2

Table of Contents

INTRODUCTION …………………………………... 3

The Curvilinear Nature of Performance 3 A Viewpoint on Versatility 4 Two Dualities Tower Above the Rest 4 Lopsided is the Rule 5 Implications for Development 6 The when and how of versatile behavior 6 Confronting distorted beliefs and attitudes 7 Working both sides 8

ADMINISTRATION ………………………………… 10

For Whom is the LVI Intended? 10 Who Should Complete the LVI? 10 Rater Confidentiality 11 Professional Interpretation and Assistance 11

DESCRIPTION OF THE INSTRUMENT ……………… 12

Item Content 12 Response Scale and Instructions 13 Overall Effectiveness 14 Open-ended Questions 15

INTERPRETING RESULTS …………………….…… 16

Overview and Orientation 16 The Forceful and Enabling (and Strategic and Operational) Sections 16 Definitions 16 Number of Respondents Who Indicated Too Little and Too Much 16 Average Ratings 18 Overall Scores (Versatility and Leadership Patterns) 21 Open-ended Comments 23 Overall Effectiveness 24

VALIDITY AND RELIABILITY ……………………… 25

History of the LVI's Development 25 Current Items 27 Reliability 27 Validity 27 Future Research 29 Research References 29

Page 4: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 3

INTRODUCTION

It feels obligatory, almost like a cliché, to begin a discussion of leadership by noting the complexity in the modern world of business. Nevertheless, today's managers do indeed work in a complex world. Every decision can come as a trade-off in an economy fraught with seemingly paradoxical demands: Companies we compete with in one arena may be our partners in another. Maximizing profits today can conflict with investing in tomorrow. The need to produce can clash with concern for people and human limits. To be up to the task, managers must be equally complex in their leadership.

What does it mean to be a complex leader? It boils down to the ability to play multiple roles,

even contradictory roles, without emphasizing some at the expense of others. Versatility, flexibility, agility – call it what you will. Effective leaders can wear many hats. They need a full range of motion, a repertoire wide enough to cover the broad array of challenges that come their way.

The Leadership Versatility IndexTM (LVI) is designed to help managers understand how

versatile they are and how they can become even more versatile leaders. Based on over 20 years of research and consulting to senior managers, the patent-pending LVI is a step forward in assessment technology. For one, it employs an innovative rating scale that is an improvement over the typical "more is better" type of scale. And the instrument is built on a conception of leadership versatility that reflects the tensions and trade-offs in the world of management. This view lights up the developmental path and makes the direction for growth and expanded capability crystal clear. The Curvilinear Nature of Performance and a Scale to Capture it

When managers lack versatility, often the problem is they're deficient – at thinking strategically or setting expectations or empowering their people. They have a flat spot. But versatility isn't just possessing the needed skill. It is also the dexterity to draw on it to the right degree for the situation at hand. How do managers misapply an ability? They overdo it. They push hard for results – to the point of burning people out. They generate new ideas – indiscriminately. They are considerate – to the point of holding in their disappointment when someone drops the ball. Many a managerial weakness, the experts tell us, is a strength taken to the extreme.

We all recognize these two basic types of performance problems: deficiencies – doing too

little of an important behavior, and excesses – doing too much of it. But traditional assessment instruments don't reflect the reality that performance problems come in these two flavors, deficiency and excess. The rating scale used in the typical 360 survey or performance appraisal

Page 5: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 4

assumes that a high score is always the best, that "more is better." They don't tease apart the two different reasons for ineffective performance – overdoing it and underdoing it.

To remedy this limitation, we developed a new rating scale for the LVI. Here the best score is in the middle, 0, surrounded by deficiency to the left and excess to the right. Degrees of underdoing it are represented by minus scores (-1 to -4); degrees of overdoing it are represented by plus scores (+1 to +4).

A Viewpoint on Versatility

Beyond being defined in terms of a single capability, versatility takes a more complex form when we pair behaviors that look contradictory yet in fact are complementary. Planning for the future versus taking action now is one example. The most effective leaders can use opposing approaches with equal ease; for instance, stepping up to take charge and empowering others to lead; performing tasks and attending to relationships; understanding the big picture and zooming in to grasp the details. On opposing pairs like these, managers are often not versatile but lopsided. They lean to one side, sometimes decidedly. They overdo one side and underdo the other.

We call this a duality-based view of versatility. It's based on the essentially two-sided nature

of leadership: for every truth about leadership, there is an equal and opposing truth. Like the Taoist notion of yin and yang, the basic idea is that it takes two complementary elements to form a whole. Neither element is complete without the other. Most cherished truths in leadership, it turns out, are only half-truths. Yes, it's important for leaders to have a strong, visible presence. It's also important for leaders to recognize others and put them in a strong position. Yes, leaders need to set direction and put the ship on course. They also need to make sure the sail is hoisted and the deck is swabbed. That leadership is essentially two-sided is evident in how we all talk about striking balances. Or, conversely, in how we use terms like "one-sided" to refer to deficient leaders.

Part of the power of this view of versatility is that it conforms well to the realities of the

management job, a veritable balancing act of an occupation if ever there was one. Another benefit of this approach is that it is clear in its developmental prescription. Whereas it is vague and behaviorally ambiguous to get feedback about being "a more flexible communicator," it is compellingly straightforward to hear that you need to "listen more and talk less."

Two Dualities Tower Above the Rest

Although there are countless pairs of complementary behaviors out there, the LVI assesses the two dualities that cover the lion's share of the leadership landscape.

Page 6: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 5

How You Lead: Forceful Leadership versus Enabling Leadership – considers who is at the focus of action and energy, the leader or other people

What You Lead: Strategic Leadership versus Operational Leadership – refers to the kinds of

organizational issues one chooses to focus upon These four dimensions are more specifically defined as:

FORCEFUL L EADERSHIP taking charge, being direct, pushing people to perform, holding them accountable, making the tough calls, and so on.

ENABLING L EADERSHIP empowering, being open to input, involving people in decisions, being supportive, showing appreciation, and so forth.

STRATEGIC L EADERSHIP

setting long-term direction, thinking broadly about the organization, seeking ways to grow and expand the business, aligning people with the vision and strategy, and the like.

OPERATIONAL L EADERSHIP focusing on short-term results, getting involved in the details, being grounded in the realities of executing, using disciplined processes to keep people on track, and so on.

Each of these aspects of leadership has been in the spotlight at one time or another. For instance, the "Great Person" theories of the early 1900s and the "John Wayne" ideal of the middle of the century held that leaders must be strong, independent personalities to enlist others to follow their lead. The post-World War II push for democracy and the humanistic movement of the 1960s made a compelling case for participative management. (Remember McGregor's Theory Y?). In the last two decades, transformational and neo-charismatic theories of leadership have put an emphasis on the visionary and strategic elements of the role. And perhaps as a backlash, recent authors have turned focus to execution with an emphasis on the art of getting things done.

Regardless of what happens to be in vogue, sustained high performance depends on all four

of these leadership roles. At a minimum, every manager needs to understand the value of each one. In the best case, managers become well-rounded leaders by developing skill at using each and the wisdom to recognize when they are called for. Lopsided is the Rule, Versatility the Exception

But rare is the manager who can draw from these four modes with equal aplomb. For one reason or another, managers tend to lack the versatility to freely cover both sides of each duality. Our research, as well as similar research reviewed by Mike Lombardo and Bob Eichinger (in The Leadership Machine), suggests that fewer than 20% of managers qualify as truly versatile. No

Page 7: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 6

wonder, then, that notions like flexibility, versatility, and agility are the latest to top the shortlist of leadership competencies in organizations.

Why do so many managers lack versatility? A simple answer may be that leadership is like

any other skilled performance: it takes years of practice, hard work, and concentrated study to master. So the question that naturally comes next is, how can managers develop skill at using both sides of these dualities and expert judgment about when each is called for? Implications for Development

Growth and development start with an accurate assessment. But after taking stock of results on the LVI, the job of development has only begun. Becoming a more versatile leader will require concentrated efforts at experimenting with new behaviors, reevaluating cherished beliefs and assumptions about effective leadership, and exploring the emotional roots of one's current way of leading.

The when and how of versatile behavior. Part of the challenge for lopsided leaders is

learning what effective behavior in each of the four modes looks like. Another part of the challenge is learning how to recognize when each is needed. There is a great deal of research-based prescriptions for when and how Forceful and Enabling approaches are called for. Much of the work conducted under the rubric of "contingency theory" is directly relevant to versatility on this duality.

• The mechanics of effective decision-making—when to make the decision oneself, consult

others for input, or include them in making the call—have been explained beautifully by Victor Vroom and colleagues.

The New Leadership: Managing Participation in Organizations, V. H. Vroom and A.G. Jago, Englewood Cliffs, NJ: Prentice Hall, 1988.

• Paul Hersey and Ken Blanchard's widely popular Situational Leadership Theory explains

when various levels of task-oriented directive and people-oriented supportive behaviors are likely to work best.

"So You Want to Know Your Leadership Style?" P. Hersey and K.H. Blanchard, Training and Development Journal, pp. 1-15, February, 1974. Management of Organizational Behavior: Utilizing Human Resources (6th ed.). P. Hersey and K.H. Blanchard, Englewood Cliffs, NJ: Prentice Hall, 1993.

• And when efforts at changing an individual's behavior aren't getting the desired result,

Fred Fiedler's has provided a useful model for changing the situation to provide a better fit to the manager's style.

Improving Leadership Effectiveness: The Leader Match Concept (2nd ed.), F.E. Fiedler and M.M. Chemers, New York: John Wiley & Sons, 1984.

Page 8: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 7

The terrain of the strategic-operational duality has been covered far less thoroughly. Nonetheless, several useful resources do exist for helping managers improve on this duality.

• John Kotter provided one of the first clear descriptions of the two very different and yet

complementary functions of management and leadership. His distinction is fundamentally the same as ours, with management corresponding to operational and leadership corresponding to strategic.

A Force for Change: How Leadership Differs from Management, J.P. Kotter, New York: Free Press, 1990.

• George Steiner is widely regarded as the key figure in the development of the field of

strategic planning. His book is considered the bible on the subject and covers material as relevant today as it was when it was first published.

Strategic Planning, G. Steiner, New York: Free Press, 1979.

• A recent book by Honeywell International chairman, Larry Bossidy, provides a rich

treatment of how managers can drive execution. The book is full of specific, concrete ways managers can build processes and systems that get results as well as the attitude that this requires.

Execution: The Discipline of Getting Things Done, L. Bossidy and R. Charan, New York: Crown Business, 2002.

Confronting distorted beliefs and attitudes. Managers are in a stronger position to grow and

improve if they work on the drivers of their behavior and do not confine developmental work to the behavioral level alone. If a manager's leadership is off, it only makes sense that she must examine what throws it off. Here we are talking about the personal, inner work of adult development and learning where basic beliefs and emotional investments are reconsidered.

A duality-based view of versatility is useful because it naturally calls attention to the beliefs

and values that underlie a manager's style. A lopsided leadership pattern almost always rests on distorted beliefs about leading. Talk to a lopsided manager and it won't be long before it emerges that he or she has an "attitude" about the neglected side. This attitude is not so much a statement about the reality of the other side as it is a polemic on the vices of that side when taken to excess. An overly forceful manager might come out with a statement like, "The problem with empowerment is that there is no accountability." Or "You can't win in the marketplace if you are obsessed about pleasing everybody." Beliefs (or should we say prejudices?) like this underpin biases in a manager's behavior.

Just below the level of distorted beliefs usually lurk fears that also drive the individual's

leadership posture and have to be overcome if the individual is to gain balance on a duality. There is the fear, for example, on the part of overly forceful individuals of not being powerful enough that produces the excess. And there is the fear, on the part of overly enabling people, of becoming a caricatured version of forcefulness, as if moving in that direction means being obnoxious. Perhaps because strategic planning requires a certain intellectual discipline, some senior managers shy away from that kind of work because of fear of failure stemming from a

Page 9: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 8

perceived sense of not being "bright enough." And many a well-intended micro-manager worries, well below the observable surface, that not being deeply involved in the unit's work amounts to shirking one's responsibilities. A duality-based view of leadership can be a door into the inner world of skewed beliefs, controlling assumptions, and sensitivities that throw off a manager's form.

We might also add an insight we have gained from consulting to executives on leadership

over the years. The tendency to overdo something or to underdo it sometimes stems from a failure to fully accept one's strengths. This applies to a particular skill or to one's capability in general. Underrating one's ability often compels a manager to try too hard. Equally it can prompt an individual to gravitate away from a task or managerial function for fear of not doing it well. Positive feedback then can serve as a vehicle for personal development. It can pay dividends to attend to feedback about strengths with as much interest as typically given to limitations.

Working both sides. A duality-based view of leadership suggests that developmental efforts

should be allocated to both sides of the equation. That is, lopsided leaders must work on both stepping up the deficient side as well as moderating the excessive side. There are two major challenges here.

First, on taming the overdone, it is important to recognize how overdoing it stems from

overvaluing it. A lopsided executive may initially have a hard time understanding how he could possibly overdo something he believes, in his heart of hearts, to be the "right way" to lead. A concentrated dose of feedback around the costs of going overboard with a valued approach can help managers understand the inherently curvilinear nature of any behavior. Moreover, this kind of feedback helps to demonstrate how emphasizing one thing to the neglect of the other can, ironically, corrupt what is most valued.

A central idea in coming to terms with a gift taken to the extreme is selectivity. When

managers contemplate toning it down on something they value, they often worry that it means giving it up. Versatility is not about taking something out of the toolkit and putting it up on the shelf, it's about adding to the mix and being more selective about when and how to use each tool. As one individual said to us, "Now I realize I don't have to give up my fastball. I just don't have to throw it all the time." He was freed up to work on his off-speed pitches, so to speak.

Second, because underdoing it stems from undervaluing it, lopsided managers must come to

appreciate that the neglected side of their leadership style has an intrinsic value and serves legitimate functions. For instance, one gifted operational executive who made a career out of turning around flailing businesses referred to strategy meetings as "fluff time." And some managers tipping to the strategic side look upon the disciplined work of implementation as "grunt work" that is below them. Once such person called it "operational gruel." In addition to avoiding these roles themselves, the lack of value that managers put on them can also prevent them from staffing their team with people who have the talent and inclination to fill them.

The key word in understanding underdoing is avoidance. When a manager is deficient, it is

often the case that he or she is avoiding that part of the job. And avoiding is a close cousin to being inhibited, which is the next-door neighbor of being afraid. Not only do managers who

Page 10: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 9

undervalue a particular leadership role have a distorted view of that role, they also are often subject to anxiety-infused assumptions that behaving that way will have disastrous consequences. This is the psychological root of aversion.

For instance, an overly forceful manager worried that if she were responsive to her direct

reports' desires for autonomy and gave them more latitude, then the quality of her unit's performance would plummet, making her a failure. An overly enabling executive fretted that asking his people to raise the bar on their performance would send a message that he doesn't appreciate their effort, which would make him an "unlikable SOB."

An important task in development is to help the person see these kinds of worried expectations for what they are: assumptions that may or may not be realistic. Working both sides, though, can be difficult because it asks a manager to ease up on what, in that person's mind, has been a key to success and make use of what, to that person, is unproven. And therein lies the courageous challenge to those who would become versatile leaders.

Page 11: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 10

ADMINISTRATION

The Leadership Versatility IndexTM (LVI) is a 360-degree feedback survey administered and scored over the World Wide Web in partnership with Performance Assessment Network (www.pantesting.com). It contains 65 items and takes each rater approximately 15 minutes to complete.

For Whom is the LVI Intended? The LVI was developed for use with managers and executives – individuals with formal

responsibility for a team, a function, a business unit, an entire organization. Thus, it is appropriate to use it with first-line supervisors, middle managers, general managers, all the way up to very senior executives.

Who Should Complete the LVI? Feedback on the LVI is designed to be collected from direct reports, peers, and superiors as

well as the target manager himself or herself. This instrument is not designed to provide feedback from external parties like suppliers or customers (but it can be customized to do so).

The target manager should nominate at least three direct reports and at least three peers to

provide ratings, although as many as six in each group is recommended. At least three members of the direct report or peer rating group must provide ratings in order to receive feedback broken out separately for that group.

The target manager's immediate superior should also be nominated to provide ratings. Other

superiors may also be asked to provide ratings if such feedback is desired. All superiors' responses will be averaged together in the feedback report. If only one superior is used, then that individual's ratings will be identified.

Up to a total of 16 individuals can provide feedback on the LVI. It is a good idea for the

target manager to nominate a variety of coworkers to provide ratings. If only those with whom the target has a very favorable relationship are chosen, the results will likely be inflated and of questionable value in development. It is also best to ask people who have a good deal of experience working with the target manager, since they will have seen the leader in action and have an informed perception of his or her leadership. But an adequate exposure to the manager will do.

The individual seeking feedback should tell the people he or she has nominated that they will

be asked to provide their input and thank them in advance for their cooperation. It is also a good idea to inform them of the purpose of the assessment. A key thing to point out is how the ratings

Page 12: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 11

will be used. In most cases, this will be strictly to provide developmental feedback to the target manager. But in other cases it may also be part of a formal performance appraisal process.

Participation of all coworkers in the feedback process should be completely voluntary.

Rater Confidentiality

It is important that direct report and peer ratings be anonymous. First, ratings tend to be

inflated when individuals believe that the person being rated will know who provided what specific ratings. Second, it is problematic when the evaluations provided by an individual can be identified by a person who has some degree of authority over him or her. Therefore, LVI results are aggregated within rating groups and presented to target managers only in summary form. This is also why at least three direct reports and at least three peers are required to provide feedback separately for these rating groups.

Note that only one superior is required to provide ratings. Of course, more than one is

acceptable, but if only one superior participates, results from that individual will still be fed back separately. Thus it is important that a single superior providing ratings is informed that his or her results will not be anonymous if no other superiors provide ratings.

Professional Interpretation and Assistance

We strongly recommend that managers be provided with qualified professional assistance

when receiving the results of their feedback. By qualified professional assistance we mean an individual with training or education in providing feedback in general and expertise with this instrument specifically. There are two reasons for our position. First, there are potential downsides to avoid. It is our experience that having data "dumped in your lap" with no assistance usually leads to either confusion, misinterpretation, inattention, or otherwise getting nothing of value out of the process. Second, there are upsides to professional assistance that might otherwise be missed. Having an objective sounding board helps to keep things in perspective, for instance, as when dealing with the inevitable criticisms that come with feedback. This kind of help is also invaluable in deciding how to best act on the results.

Page 13: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 12

DESCRIPTION OF THE INSTRUMENT

The Leadership Versatility IndexTM (LVI) includes a total of 65 items. There are sixteen items for each of the four domains: Forceful, Enabling, Strategic, and Operational leadership. There is also one item about overall effectiveness. The effectiveness item is rated on a 1- to 10-point scale, unlike the other items that are rated with our new underdo/overdo scale. These are the components of the standard version of the instrument. A deluxe version is also available, where the ratings on the standard instrument are supplemented with answers to five behavioral interview questions that go with the sections on Forceful, Enabling, Strategic, and Operational leadership and overall effectiveness.

I tem Content

The items for each domain are grouped together in the survey so that the respondent rates

each domain separately. But the items were generated in complementary pairs – each Forceful item has an Enabling item to go with, each Strategic item is paired with an Operational item. The result for each pair is presented side by side in the feedback report. The reason is that it is more powerful for managers to see the two results juxtaposed rather than singly. It is also in keeping with the duality-oriented conception of versatility. It can be eye-opening, for example, for an overly forceful manager to see how her peers rated her as overdoing "declares where she stands" and underdoing "receptive to their ideas."

Each item on the LVI describes a discrete, observable behavior. To provide the fairest and

highest quality feedback, no items ask about attitudes, private thoughts, or other subject matter that is not observable.

Items were written to describe behaviors that could be both underdone and overdone as well

as performed effectively. For instance, an item like "effectively makes his/her point to a resistant audience" wouldn't work with the rating scale because it doesn't make sense to "overdo" on making a point effectively.

The content of the items came from one of two basic sources. Most of the item pairs were

based on coworker descriptions or our observations of one of our senior-management clients collected over the last two decades. The items were worded carefully to reflect the language managers naturally use to describe their behavior and the behavior of others. The minority of items were based on descriptions of managerial behavior found in the scholarly and popular literature. Even in these cases, we tried to couch the concepts in vivid managerial language.

We went through several stages of revision and refinement in drafting the items. At each

stage, we asked for feedback from seasoned management consultants and academics as well as senior managers themselves. We typically asked whether (a) each item was clear in its intent or ambiguous, (b) each item described one and only one discrete behavior, (c) the items were phrases commonly used to describe managerial behavior, and (d) whether the items in a pair

Page 14: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 13

were in fact complementary. We also sought feedback on whether we had adequately represented each of the forceful, enabling, strategic, and operational domains of leadership behavior.

Response Scale and Instructions The response scale on the LVI represents an innovation in performance assessment. Unlike

traditional Likert-type rating scales, where a higher score usually is the "best" score, this scale represents the inherently curvilinear nature of performance. On this scale, the best score is in the middle of the scale, a value of "0."

The idea behind this scale is that there are two basic kinds of managerial performance

problems. The first kind is the one measured by the typical scale, when managers don't do enough of something – when they are deficient at the given competency and "underdo" it. The second kind, hard to find in typical measures and with the usual scale, is when managers go overboard and do too much of something – when they are "overdeveloped" in a certain area and "overdo it."

The actual instructions and rating scale from the LVI are presented below.

Page 15: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 14

Ratings meant to indicate that the manager does too little of a given item are negative numbers. Positive numbers indicate that the manager does too much of the behavior in question.

For example, a score of -1 means that the manager does "barely too little" of the behavior. A

score of -4 means that the manager does "far too little" of the behavior. Thus, underdo ratings range from -1 to -4.

The same logic applies for responses in the +1 to +4 direction, the overdo range. For

example, a score of +1 means that the manager does "barely too much" of the behavior. A score of +4 means that the manager does "far too much" of the behavior.

The current incarnation of the rating scale is the result of several stages of refinement. And it

is based on current research on the cognitive processes that underlie the rating process. Specifically, we used negative numbers with minus signs to indicate deficiency and positive numbers with plus signs to indicate excess because research has shown that raters interpret scales that range from negative to positive values to indicate bipolarity (or that the two ends measure opposite things, like too little and too much). Also, the arrows above the scale reinforce the idea that there are degrees in both directions. Finally, the boxes around the numerical values indicate that "underdoing it," "the right amount," and "overdoing it" are three qualitatively distinct types of evaluations. Overall Effectiveness

The LVI also contains an item concerning overall effectiveness at the end of the survey. It reads, "Please rate this individual's overall effectiveness as a manager on a ten-point scale where 5 is adequate and 10 is outstanding."

There are three reasons for including this item. First, it helps the individual receiving

feedback to gauge where he or she stands overall. It is helpful, after reviewing results across 64 items and four domains, to know your overall standing. Second, it helps to calibrate the significance of the other ratings. We sometimes hear managers ask, "So, I am too forceful and not enough enabling. How bad are these results?" When the overall effectiveness rating are relatively low (below 6.5 or so), the answer is that there are some serious issues to work on. When the overall effectiveness ratings are relatively high (above 8.5 or so), the situation is less urgent, although there is room to move the needle. Tying ratings on specific behaviors to overall effectiveness can help inspire motivation to take action in those areas most in need of development. Finally, the effectiveness rating is used in ongoing research to ensure that the other items in the LVI do in fact measure behaviors that are central to effectiveness.

The overall effectiveness item is rated on a typical "more is better" rating scale because

people are familiar with how to interpret scores on these scales and the calibrating role it plays in interpreting results.

Single-item measures are sometimes criticized under the suspicion of being unreliable. However, we have used this measure in previous research and have determined it to have sufficient validity and reliability as a measure of perceived effectiveness. Specifically, we found

Page 16: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 15

this rating to be comparable to multi-item scales in terms of inter-rater reliability and inter-rater agreement within rating sources and convergent validity in the correlations between rating sources. Finally, the item is worded very similarly to items that tend to load the highest in factor analyses of multi-item scales that measure overall effectiveness. Open-ended Questions

The deluxe version of the LVI includes an electronically administered set of five open-ended questions to supplement the numerical ratings. After rating the sixteen items in each dimension, the respondent is asked to provide written-in comments about the person's performance on that aspect of leadership: In what ways is he/she effective at it? How could he/she be more effective? And after completing the 1- to 10-point rating of overall effectiveness, the respondent is asked to explain what specifically about the person is effective, and also what he/she would have to do to get a higher rating.

Thus, the behavioral interview consists of five questions that complement the rating portion of the standard instrument.

The value of supplementing the quantitative data with these verbatim quotes is tremendous.

The numbers provide a skeletal outline of the individual's leadership, but it is the verbatim descriptions here that flesh out the skeleton and bring it to life. There is an immediacy to these comments that feedback recipients find compelling. After having seen the numbers, it is not uncommon for those receiving feedback to focus on the quotes from their coworkers when drawing conclusions.

Page 17: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 16

INTERPRETING RESULTS

The Leadership Versatility IndexTM (LVI) feedback report summarizes the individual ratings by presenting the results across all coworkers in aggregate as well as presenting results separately for the self, superior, peer, and direct report rating groups. There are four primary sections to the report: An overview and orientation, Forceful and Enabling Leadership, Strategic and Operational Leadership, and Overall Effectiveness.

The following pages describe each section of the feedback report and what aspect of the

results is presented in each. Explanations are also provided for how the scores in each section are computed and how they can be interpreted, including rules of thumb for determining what are good, not so good, and seriously problematic scores.

Overview and Orientation

Page one of the report is intended to provide the feedback recipient with the key ideas behind

the instrument: First, that the tool measures versatility as opposed to lopsidedness on the two dualities; second, the unique underdo/overdo rating scale and the rationale behind it; and third, how this view of versatility is built upon the idea of capability on opposites or dualities, opposing behaviors that are in fact complementary.

The Forceful and Enabling (and Strategic and Operational) Sections

Results for each of the two dualities – Forceful and Enabling Leadership, and Strategic and

Operational Leadership – are presented in separate sections. The basic format is the same for each duality, with only the item content changing. Within each section, there are four different subsections that present alternative looks at the data. In the deluxe version of the LVI, there is a fifth subsection containing the write-in responses to the electronic behavioral interview questions.

Definitions. The first page of each section provides a definition of the two dimensions that

make up the duality in question. It includes a statement about what it means to be versatile versus lopsided on that duality. This page also presents a count of how many raters there were overall and how many were in each of the rating groups (superiors, peers, and direct reports).

Number of Respondents Who Indicated Too Little and Too M uch. The first page

following the subsection containing definitions provides a count of the number of ratings from coworkers on the too little and too much sides of "the right amount." The bold, middle two columns under Forceful and Enabling (or Strategic and Operational) present the count across all

Page 18: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 17

coworkers. Flanking these columns are the counts for each rating group separately. Self-ratings are not presented in this subsection – they are juxtaposed against the results from coworkers in a later section.

At the bottom of this table, the counts are summed across all sixteen items, in the row labeled

Totals. The last row, Average per Item, is the Total count divided by the number of items, 16. The Average-per-Item values allow you to quickly assess the overall pattern: too much or too little of one, too much or too little of the other side of the duality.

Of course, the specific behaviors described by the items are of as much, if not more, interest

than the overall pattern. Scanning the too little/too much columns under each dimension and noting the items with the most counts gives a quick sense of the specific behaviors that will deserve the most attention. Also important are those areas of strength – the items with the fewest ratings in the too little or too much areas. These are the behaviors that the individual is most effective with. For instance, if there are 9 raters and there is only 1 too little rating and no too much ratings for a particular item, it means that there were 8 ratings of the right amount. (To simplify the amount of data on this page, the counts for ratings of the right amount are not presented.)

The items are arrayed on this page to allow two different kinds of looks at the data. Analysis

can focus on the behaviors within one dimension by examining, for instance, the results across all Forceful items first, then the results across all Enabling items. Alternatively, the degree of versatility versus lopsidedness on complementary pairs of behaviors from each dimension can be assessed by comparing results for items in the same row. That is, items 1f. and 1e. are a pair, as are items 2f. and 2e., and so on.

This subsection of the report also provides information pertaining to the degree of agreement

among the various raters. For instance, when the count on the too much side is large relative to the total amount of raters and there are no counts on the too little side, then there is consensus about a problem of overdoing on that item. When there are approximately equal numbers of counts in the too little and too much columns, the jury is split. In these cases, closer inspection of the counts within the different rater groups is warranted. It may be that two groups, say superiors and direct reports, are seeing two very different things: perhaps one sees clear overdoing, and the other sees clear underdoing. These kinds of disconnects can be important topics for follow-up discussions about different expectations from different constituents, tendencies to behave differently depending on who the audience is, and so forth.

A question naturally arises on this page: how many counts is a lot? The answer depends on

the total number of raters. For instance, a count of four overdo ratings for the direct report group is much more noteworthy when there are four direct reports than when there are eight. Generally, it is noteworthy when more than half of the raters in a group provide ratings on the too little or too much side. What is most important to note on this page is:

• What is the overall pattern: too much of one dimension, too little of the other? Too little of both?

• Which items have the highest number of counts in the too little or too much direction? • Which items have a very low number of counts in the too little or too much direction?

Page 19: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 18

• Which items show a large disparity in terms of overdoing versus underdoing across rater groups? Within rater groups?

Average Ratings. The next four pages present a detailed look at the data that is based on

average, or mean, ratings. Each pair of complementary items is presented separately and the average rating for all coworkers and each rating group is presented in both visual and numerical form. The reason for this is because some individuals prefer tabular data whereas others have a greater facility with images. Of course, the two modes also complement one another, with the visual representations allowing at-a-glance identification of noteworthy results and the numerical presentations allowing a precise quantification of the results.

Self-ratings are also presented in this section, providing a calibration of the accuracy of self-

views vis-à-vis how coworkers see things. The visual presentation of the data consists of plotting the average scores for each item

separately for each rating group. A letter code is used to plot the results for each rater group (Y = you or self-ratings, S = superiors, P = peers, and D = direct reports). The results computed across all coworkers are presented with thick bar graphs which begin at "0" and extend in the appropriate direction. These results are plotted in a field defined by the response scale values representing too little (< 0), the right amount (0), and too much (> 0). Note that for the visual presentation, the scale only ranges from -3 to +3, whereas the values on the survey ranged from -4 to +4. The reason is twofold: first, the extreme points of the scale are very rarely used (ratings of -4 and +4 occur less than 1% of the time). Second, honing in on the range from -3 to +3 helps to bring noteworthy scores into relief to the naked eye. It is like magnifying the results to pick up significant disparities and deviations between rating groups, between ratings on complementary items, and between ratings on different item pairs. In the few instances where an average score is less than -3 or exceeds +3, it is presented visually as a -3 or +3, respectively. However, the corresponding numerical value in the table is presented as the actual average.

The numerical item scores for each rating group are calculated as the average rating across

raters in that particular group. For instance, if one direct report rated an item a +1, another rated it a +2, and a third rated it a +1, then the average direct report rating would be +1.33. For the all coworkers score, the average rating is computed across all non-self raters – one rater, one vote. (As opposed, say, to the average of the superiors' average, the peers' average, and the direct reports' average – one group, one vote). The values in the self-rating column are those provided by the individual receiving feedback; therefore, they will equal one of the nine scale values (-4 to +4).

When interpreting scores, it is important to note those values flagged with an asterisk (* ).

Like objects in your rear-view mirror that are closer than they appear, the scores denoted with an * appear closer to optimal than they actually are. Specifically, they are averages based on a mix of too little (negative) and too much (positive) ratings. For instance, the average of a rating of -1 and +1 would be "0," suggesting that the individual is optimally effective in that area, which was clearly not the intent of the raters. When the average score is "diluted" by scores on both sides of the right amount, it is flagged with an asterisk (* ). In these cases, there are several things to do to aid interpretation:

Page 20: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 19

• First, you can refer to the summary page to check the number of counts on both sides of optimal. It may be that there is a very small dissenting minority, which dilutes the majority opinion by a negligible amount.

• Second, consider the average score. If it is a large value, say greater than |0.50|, it ratifies the interpretation that the majority consensus is worth attending to and the flag merely indicates a small dissenting minority.

• Third, if the average scores is very close to "0" – say between - 0.25 and +0.25 – it suggests that the discrepant ratings have "cancelled each other out" and the average rating may be a false read. This means that coworkers likely see issues with the behavior, but for different reasons. It is difficult to act on this kind of feedback, except to note that some constituents hold very different expectations and standards or have exposure to very different samples of that behavior.

• Finally, there will be very few instances where a score is flagged. In our previous research with ratings of executive-coaching clients, less than 1 in 8 average scores was flagged. Therefore, it may be more productive to focus attention on those items where there is consensus.

A natural question: what are cutoffs separating "good scores" from "not-so-good scores"?

When interpreting the average rating across several people, it is misleading to use the original scale anchors to give meaning to the average score (e.g., +/- 1 = "barely" too much/too little). There are many reasons for this offered by measurement experts. First, raters rarely rely on the literal meaning of scale anchors to assign ratings – most raters have a somewhat idiosyncratic and personal interpretation of what a 1 is, what a 2 is, and so on. Second, because of the statistical phenomenon known as regression to the mean, average scores will drift away from the extreme anchors on the original response scale and thus "shrink" the effective range of the scale that average scores actually cover. And finally, raw scores by themselves are of little value without some type of normative information to interpret them against. Some items and tests are "easy" – everybody gets high scores and so what may appear in a vacuum to be a high score may actually be a low score relative to other test-takers. Similarly, some items and tests can be "hard" – and suffer the opposite interpretive dilemma. Thus, although the original LVI response scale value of +1 is anchored with "barely too much," an average item score of +1.00 doesn't necessarily mean "no big deal."

We have developed interpretive guidelines based on a study of ratings from 888 coworkers of

a sample including 55 executives and 52 managers. In this sample, the average standard deviation (SD) of average ratings for each LVI item was about .55. A statistical convention for a "moderate effect" is a difference of .6 SDs. Thus, .6 of .55 is .33, the critical value separating a small deviation from "the right amount" (0) from a noteworthy deviation. The boundary between "potential issues" and "serious issues" was calculated as two of these critical values further out (i.e., .33 +2*(.33) = .99 ~= 1.00).

These interpretive guides were validated by another analysis of the data. We computed

regression analyses that modeled the curvilinear relationship between scores on the forceful and enabling dimensions and ratings of overall effectiveness. In both cases, the inverted "U" shape indicated that the highest levels of effectiveness corresponded to LVI average scores near 0. The curve was relatively flat at the peak, but around the values of -0.33 and +0.33, the legs of the inverted "U" started to become steep. This indicates that average ratings between -0.33 and

Page 21: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 20

+0.33 corresponded, on average, to about equally high effectiveness ratings. However, LVI scores near -1.00 and +1.00 are linked to dramatically lower effectiveness ratings than LVI scores near -0.33 and +0.33, respectively. The graph bellow summarizes the relationship between average scores on the LVI and overall effectiveness.

Taken together, these two methods lead to the following interpretive guides for average item ratings: Serious underdoing < -1.00 Potential underdoing issue -0.33 to -0.99 Effective performance -0.33 to +0.33 Potential overdoing issue +0.33 to +0.99 Serious overdoing > +1.00

Of course, a cut-score is merely a heuristic and necessarily somewhat arbitrary. Strictly speaking, the greater a score deviates from the ideal score, 0, the more of an issue it is. Nonetheless, some rule of thumb is helpful in conveying what the feedback results mean.

What is probably most important for the individual to focus on is his or her most extreme

scores, relative to his or her other scores. In other words, the handful of items with the greatest departure from optimal, 0 (regardless of how far that departure is), are those that deserve the greatest developmental attention of all of the behaviors covered in the LVI. Similarly, all of those items that are rated very near 0 deserve attention; they are the individual's standout strengths.

Overall Scores: Versatility Scores and Leadership Patterns. The two pages following the subsection on average item ratings provide a wrap-up of the results for the duality under consideration. On the first page is a general statement of how versatility is portrayed on the LVI.

|....|....|....|....|....|....|....|....|....|....|....|....|- 3 - 2 - 1 0 +1 +2 +3|....|....|....|....|....|....|....|....|....|....|....|....|

- 3 - 2 - 1 0 +1 +2 +3|....|....|....|....|....|....|....|....|....|....|....|....|

- 3 - 2 - 1 0 +1 +2 +3

+1.5+1.0+0.50.0-0.5-1.0-1.5

10

9

8

7

6

5

4

LVI Average Score

Ove

rall

Eff

ecti

vene

ss

+1.5+1.0+0.50.0-0.5-1.0-1.5

10

9

8

7

6

5

4

LVI Average Score

Ove

rall

Eff

ecti

vene

ss

+1.5+1.0+0.50.0-0.5-1.0-1.5

10

9

8

7

6

5

4

LVI Average Score

Ove

rall

Eff

ecti

vene

ss

Page 22: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 21

The inverse is also discussed: lopsidedness. The following page presents two separate types of data:

• A Versatility Score, the degree of versatility on the duality as rated by each source. • The percentage of too little and too much ratings for the two dimensions that define the

duality in question.

The Versatility Score is computed by considering how close ratings are to 0 – the right amount – on both sides of a duality. It represents the degree to which the individual is rated as moving freely in and out of opposing modes; that is, the extent to which he or she makes optimal use of both sides of the forceful and enabling duality or the strategic and operational duality. Higher scores indicate greater versatility. The Versatility Score is computed using a mathematical formula that may seem complicated at first but makes sense after thinking it through.

Just as the Greek philosopher, Aristotle, defined virtue as the point between "deficiency"

(underdo) and "excess" (overdo), we turned to Pythagoras, the Greek mathematician, to quantify versatility as a "double virtue." Mathematically, the Versatility Score represents how close the individual's ratings are to "0" (the right amount) on both items in a pair, across all pairs in the duality.

The Versatility Score can range from 0% (the most extreme possible scores of - 4 and +4) to

100% (perfect scores of 0 and 0). A given individual's versatility score is calculated using the Pythagorean theorem and the arithmetic shown below.

Using the formula depicted above, a Versatility Score is computed for each of the sixteen item pairs in the Forceful and Enabling duality (or Strategic and Operational duality). This value is computed separately for each rater. Then, the average score across these sixteen values is computed, yielding the overall Versatility Score for that rater. Finally, the average Versatility

Too MuchFORCEFUL

Too LittleFORCEFUL

Too LittleENABLING

Too MuchENABLING

Your Distance from "Right Amount"

The RightAmount

Maximum Distance from "Right Amount"

Max. Distance

Your Distance

=Max. Distance

(Max. Distance – Your Distance)

Too MuchFORCEFUL

Too LittleFORCEFUL

Too LittleENABLINGToo Little

ENABLINGToo Much

ENABLINGToo Much

ENABLING

Your Distance from "Right Amount"

The RightAmount

The RightAmount

Maximum Distance from "Right Amount"

Max. Distance

Your Distance

=Max. Distance

(Max. Distance – Your Distance)=

Max. Distance(Max. Distance – Your Distance)

Max. Distance(Max. Distance – Your Distance)

VERSATILITY SCORE

Too MuchFORCEFUL

Too LittleFORCEFUL

Too LittleENABLINGToo Little

ENABLINGToo Much

ENABLINGToo Much

ENABLING

Your Distance from "Right Amount"

The RightAmount

The RightAmount

Maximum Distance from "Right Amount"

Max. Distance

Your Distance

=Max. Distance

(Max. Distance – Your Distance)=

Max. Distance(Max. Distance – Your Distance)

Max. Distance(Max. Distance – Your Distance)

Too MuchFORCEFUL

Too LittleFORCEFUL

Too LittleENABLINGToo Little

ENABLINGToo Much

ENABLINGToo Much

ENABLING

Your Distance from "Right Amount"

The RightAmount

The RightAmount

Maximum Distance from "Right Amount"

Max. Distance

Your Distance

=Max. Distance

(Max. Distance – Your Distance)Max. Distance

(Max. Distance – Your Distance)=

Max. Distance(Max. Distance – Your Distance)

Max. Distance(Max. Distance – Your Distance)

VERSATILITY SCORE

Page 23: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 22

Score across all raters in a given group is computed. Thus, the versatility index reflects how close the client's scores are to the "Right Amount" on both sides of the duality overall, across all raters and all pairs of items. This is the value presented in the report.

Below is an example of how the Versatility Score is computed for one item pair. In this

example, the individual received a Forceful score of +2.00 and an Enabling score of -1.00.

Client's distance from optimal: c2 = a2 + b2 c2 = (+2)2 + (- 1)2 c = 2.236 Maximum distance from optimal:

c2 = a2 + b2 c2 = (+4)2 + (- 4)2 c = 5.657 Versatility Score: Max. Distance - Client's Distance Max. Distance = 5.657 - 2.236 5.657 = .605 Versatility Score = 60.5%

How to interpret how "good" or "not-so-good" a Versatility Score is? Our research with the earlier and current revised version of the Forceful and Enabling scales indicated that the average versatility score is about 83%, with values ranging from a low of 54% to a high of 98%. Thus, the versatility index can be interpreted much like grades in school, with the following rules of thumb (the values in parentheses represent the percentage of managers who received each grade):

"A" = 90% to 100% (about 20%) "B" = 80% to 89% (about 54%) "C" = 70% to 79% (about 20%) "D" = 60% to 69% (about 5%) "F" = 59% and below (about 1%)

The second type of summary data presented on this page is the percentage of ratings in the too little and too much direction for Forceful and Enabling (or Strategic and Operational). These values clarify the overall pattern in the ratings: is there a preponderance of overdoing one thing and underdoing the other? Underdoing both? Or very little overdoing and underdoing?

Too MuchENABLING

Too MuchFORCEFUL

Too LittleFORCEFUL

Too LittleENABLING

0a

a

b

b c

c

Too MuchENABLINGToo Much

ENABLING

Too MuchFORCEFUL

Too LittleENABLING

0a

a

b

b c

c

Too MuchFORCEFUL

Too LittleENABLING

0a

a

b

b c

c

100%

ClientDegree of Versati li ty

60.5% 39.5%

Too MuchENABLINGToo Much

ENABLING

Too MuchFORCEFUL

Too LittleFORCEFUL

Too LittleENABLING

0a

a

b

b c

c

Too MuchENABLINGToo Much

ENABLING

Too MuchFORCEFUL

Too LittleENABLING

0a

a

b

b c

c

Too MuchFORCEFUL

Too LittleENABLING

0a

a

b

b c

c

100%100%

ClientDegree of Versati li ty

60.5% 39.5%

60 100%

|….|….|….|….|….|….|….|….|….|….| 60 60 |….|….|….|….|….|….|….|….|….|….| 60

F D ABC

90 70 80

Page 24: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 23

The values here are computed in a straightforward manner. For example, every rating in the

too little region across all items within a dimension – say, the sixteen Forceful items – is summed across all raters in a given rating group. Then, this value is divided by 16 (the number of items) times the number of raters. This ratio is then converted to a percentage that reflects how often, overall, the individual was rated as underdoing Forceful leadership.

For instance, say there are 10 raters. Further, suppose these 10 raters provided a total of 40

ratings across the 16 Enabling items in the too much region. Then the percentage of too much Enabling would be: 40 too little ratings/(10 raters * 16 items) = 40/160 = .25 = 25%

These values help to define the overall, or predominant, leadership pattern. In the case of

lopsidedness, they reveal the side to which the manager leans. There are four basic patterns for each duality. Below is how the four patterns are defined for the Forceful and Enabling duality; an analogous set of patterns are similarly defined for the Strategic and Operational duality.

Too Forceful/Not percentage of too much Forceful > percentage of too little Forceful and Enabling Enough percentage of too little Enabling > percentage of too much Enabling Too Enabling/Not percentage of too much Enabling > percentage of too little Enabling and Forceful Enough percentage of too little Forceful > percentage of too much Forceful Disengaged percentage of too little Forceful > percentage of too much Forceful and percentage of too little Enabling > percentage of too much Enabling Versatile percentage of too little and too much Forceful are both relatively low and percentage of too little and too much Enabling are both relatively low and the versatility score is > 90%. Theoretically, a fifth pattern is possible: when the percentage of too much ratings is greater

than the percentage of too little ratings for both Forceful and Enabling. However, this is an empirical null-set. We have yet to encounter a single instance of the too much/too much pattern in data from three samples including over 300 managers. And this isn't unexpected: the underlying theory holds that individuals tend to be lopsided on dualities like these because they value one side at the expense of valuing the other.

Open-ended Comments. (This subsection is only included in the deluxe version of the LVI).

Responses to the open-ended questions for the two dimensions in each duality are presented on the last page of the section covering that duality. After completing the ratings on the LVI survey, respondents are asked to comment on this aspect of the individual's leadership, "In what ways is he/she effective at it? How could he/she be more effective?"

Responses are presented exactly as provided on the original survey; in other words, they are

not edited, modified, or otherwise altered. These responses are presented in random order so

Page 25: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 24

that, for instance, the first response under Direct Reports for Forceful leadership is not necessarily from the same individual who provided the first Direct Report response presented under Enabling leadership. Overall Effectiveness

The final section of the LVI report presents the results for the ratings of overall effectiveness. Respondents were asked to rate the individual's "overall effectiveness as a manager on a ten-point scale where 5 is adequate and 10 is outstanding." Raters can respond with any number between 1 and 10, with either whole numbers or numbers that include up to two decimal places (e.g., 7.67). The average rating is presented for All Coworkers taken together as well as separately for the Self, Superior, Peer, and Direct Report rating groups. The range of ratings is also presented for each rating group – the lowest and the highest rating.

The following table presents normative information for the effectiveness ratings

Mean SD Range Self 7.37 1.24 4.00 to 10.00 All Coworkers 7.67 .85 5.41 to 9.42 Superiors 7.66 1.18 4.00 to 10.00 Peers 7.40 1.00 4.13 to 9.50 Direct Reports 7.82 .94 4.50 to 9.50 Note: Norms are based on ratings of 120 senior managers from over 1,300 raters. Ratings from selves and peers are statistically significantly lower than ratings from superiors and direct reports.

Thus, the average effectiveness rating is in the mid-sevens. Ratings in the mid-eights and higher are considered very high. Ratings in the mid-sixes and lower are quite low.

Open-ended comments are captured with an interview question about overall effectiveness

on the deluxe version of the LVI. Respondents are asked to "Explain briefly what specifically about this person is effective, and also what he/she would have to do to get a higher rating." Verbatim responses collected here are presented on the page following the numerical ratings of overall effectiveness. The numerical response given by each rater and his or her comments are presented together in a random ordered within the appropriate rating group.

Page 26: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 25

VALIDITY AND RELIABILITY

The science of measurement applies as much to performance assessment as to any other human attribute that is quantified. Thus, the quality of a 360-degree feedback tool is directly tied to the rigor with which it was designed: how well the instrument functions according to its underlying conception, the standards employed in the statistical evaluation of items and scales, and, most importantly, the strength of the association between scores on the instrument and organizational outcomes of consequence such as unit effectiveness, culture and climate, turnover, and so on.

The Leadership Versatility IndexTM (LVI) in its present form emerged gradually from two

decades of consulting to senior managers and research on senior leadership. The actual measure has been the focus of ongoing research and development since 1994. This section describes an overview of the iterative process that led up to the current instrument and of the evidence attesting to its validity. At the end is a list of references that document the research in detail.

History of the LVI 's Development

Bob Kaplan began the development of the LVI and the idea of the curvilinear rating scale with the Forceful and Enabling scales, the first duality to grab his attention (see Kaplan, 1996). After about three years of using the instrument in our firm's practice, Rob Kaiser joined in. We took our first statistical look at the data once we had a sample of ratings for about 65 executives. It became apparent that writing items to work with the unique new rating scale and the underlying duality-based notion of versatility was a tricky task. Of the first 11 item pairs, only about half worked as intended (see Kaiser & Kaplan, 2000): the distribution of some items didn't extend far enough into the too much region, some items didn't show the inverse relationship with their complementary item (meaning as one is overdone, the other tends to be underdone), and so on.

We extracted several lessons that amounted to design principles for this new kind of measure

and went back to the drawing board. We drafted a new set of item pairs. Also, we decided to measure the strategic-operational duality at this stage. After a new sample of data was collected, we took a very close statistical look at the new items with ratings for a little over 100 executives from over 1100 coworkers. Our analyses were primarily focused on the forceful-enabling items.

First, using half of the sample, we conducted factor analyses to ensure that the items did in

fact measure two distinct dimensions corresponding to forceful and enabling leadership. Along the way, some items were dropped because they obscured a clear two-factor structure. Once we arrived at a set of items that seemed to work well, we then conducted a confirmatory factor analysis using the remaining holdout sample. Here we tested how well the two-factor model fit the data and how well it fit for self-ratings as well as for superior, peer, and direct report ratings separately. We also used analyses based on Item Response Theory to determine if the items

Page 27: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 26

were equivalent measures across the different rating groups. The result of these analyses was five pairs of items that met stringent psychometric standards. The details can be found in Kaiser, R.B. and Craig, S.B. (2001).

We next performed a series of analyses to understand the reliability and validity of 360

ratings on these five pairs of items. The following key results were found:

• The two five-item scales were internally consistent – alpha reliabilities were .80 for Forceful and .76 for Enabling.

• Although forceful and enabling were determined to be separate factors, they were inversely correlated, r = -.50. This reflects the tendency for managers to overdo one side and underdo the other.

• Raters within sources showed remarkable convergence in their ratings. Despite the subjective nature of the underdo/overdo scale, inter-rater reliability and inter-rater agreement was substantially higher than is typically reported for performance measures (in the .70s and .90s, respectively). In other words, there is a great deal of agreement among raters about who does too little and too much of which behaviors.

• Ratings between sources showed a fair degree of similarity, but also a useful degree of difference. On average the correlation between superior, peer, and direct report ratings was r = .53. This means that there is enough convergence across sources to make overall roll-up scores meaningful and there is also enough uniqueness between sources to look at them separately.

• There were four types of leadership patterns – versatile (ratings near the right amount on both Forceful and Enabling), lopsided forceful, lopsided enabling, and disengaged (too little on both scales). There was no instance where an individual was rated at too much on both scales.

• The relationship between the four types and effectiveness ratings were as expected: the versatile group received the highest ratings, the disengaged group received the lowest ratings. The two lopsided groups were in between and not significantly different from one another.

• There was a large correlation between Versatility Scores and effectiveness ratings. Within rating sources, these values ranged from r = .53 to r = .68. This is a strong association and demonstrates that versatility on the forceful-enabling duality is a key driver of overall leadership effectiveness.

The pilot study with the first generation of strategic-operational items was less fruitful, yet

promising. As might be expected, these items didn't work as well as the second-generation forceful-enabling items. But we learned a lot about this duality in this stage of research and were ready to move to the next generation of development.

Taken together, the solid statistical results obtained for the Forceful and Enabling items

convinced us that we had figured out how to create a powerful measure of performance around the new scale and the duality-oriented conception of versatility (see Kaplan & Kaiser, in press). We took the lessons learned from the first two iterations of experimentation, as well as from a close reading of the research on similar distinction such as initiating structure versus consideration and autocratic versus participative, to the next level by expanding content coverage of the Forceful and Enabling domains with more items. We also applied what we had learned to

Page 28: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 27

improving our measure of Strategic and Operational leadership. This also included consulting the scholarly and popular literatures about similar dimensions of leadership. The Current I tems

The process we used in mapping out the Forceful, Enabling, Strategic, and Operational domains and then crafting items for each is described in the preceding section, Description of the Instrument – Item Content (p. 10). We have conducted a preliminary analysis of the current LVI Forceful and Enabling items. The revised Strategic and Operational items were developed later; consequently, we are still collecting an adequate sample to analyze.

The following statistical results are based on ratings of 55 senior executives and 52 lower-

level managers. The ratings were made by a total of 888 coworkers. To simplify the presentation, all analyses are based on the average rating across all coworkers.

Reliability. The 16 items in each of the Forceful and Enabling domains are highly internally

consistent – alpha reliabilities are .93 and .96, respectively. Inter-rater reliability and agreement are similarly high, easily exceeding the minimum standards of .70. This means that different raters of the same individual show a substantial degree of consensus about what that individual does too little, just right, and too much.

The overall Versatility Score is also very reliable – the coefficient alpha for this score was

.97 (see pp. 19-20, under Interpreting Scores – Overall Scores: Versatility and Leadership Patterns for details on how this value is calculated).

Validity. There are two pieces of evidence attesting to the construct validity of this measure

of the forceful-enabling duality. The first pertains to internal structure – how the two dimensions relate to each other. The underlying theory posits a strong negative correlation because managers who overdo one side of the duality are thought to underdo the other side. And this is in fact what we find between overall scores on both sides, a strong negative correlation of r = -.64.

Enabling+1.5+ 1.0+0.50.0-.5-1.0-1.5

For

cefu

l

+1.5

+1.0

+0.5

0.0

-0.5

-1.0

-1.5

Page 29: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 28

The second type of validity evidence concerns how scores on the Forceful and Enabling measures relate to other variables. In this case, we found strong links between overall scores on each dimension and effectiveness. The curvilinear correlation between Forceful and overall effectiveness is R = .63; for Enabling it is R = 57. These relationships indicate that both approaches are indeed very important to effective leadership, and about equally so. They are displayed below.

The link to overall effectiveness is even more pronounced when scores on Forceful and

Enabling are considered simultaneously. The correlation between the Versatility Score and overall effectiveness is an astonishingly high r = .81. To some degree this correlation is inflated because both types of data are ratings and come from the same raters. What is remarkable, though, is that the Versatility Score is based on ratings of just two dimensions – Forceful and Enabling leadership – yet it is so strongly related to the overall evaluation of effectiveness. No doubt versatility in these terms is fundamental to what people mean by effective leadership.

10

9

8

7

6

5

4

+1.5+1.0+0.50-0.5-1.0-1.5

10

9

8

7

6

5

4

Overall Forceful

Ove

rall

Eff

ecti

vene

ss

Ove

rall

Eff

ecti

vene

ss

Overall Enabling+1.5+1.0+0.50-0.5-1.0-1.5

F& E Versatil i ty

1.0.9.8.7.6

Eff

ecti

vene

ss (

All

Co

wor

ker's

avg)

10

9

8

7

6

5

4

60% 70% 80% 90% 100%1.0.9.8.7

10

9

8

7

6

5

Ove

rall

Eff

ectiv

enes

s (A

ll C

owor

kers

) 10

9

8

7

6

5

Ove

rall

Eff

ectiv

enes

s

50% 60% 70% 80% 90% 100%

90th

50th

10th

Forceful & Enabling VersatilityF& E Versatil i ty

1.0.9.8.7.6

Eff

ecti

vene

ss (

All

Co

wor

ker's

avg)

10

9

8

7

6

5

4

F& E Versatil i ty

1.0.9.8.7.6

Eff

ecti

vene

ss (

All

Co

wor

ker's

avg)

10

9

8

7

6

5

4

60% 70% 80% 90% 100%1.0.9.8.7

10

9

8

7

6

5

Ove

rall

Eff

ectiv

enes

s (A

ll C

owor

kers

) 10

9

8

7

6

5

Ove

rall

Eff

ectiv

enes

s

50% 60% 70% 80% 90% 100%

90th

50th

10th

Forceful & Enabling Versatility

Page 30: Copyright © 2002 Kaplan DeVries Inc.

Copyright © Kaplan DeVries Inc., 2002. All Rights Reserved. 29

The figure above presents this relationship graphically. A correlation coefficient is a somewhat abstract thing to get a hold of. Another, more concrete, way to interpret the strength of this relationship: 9 of the 23 managers (40%) who got an "A" on versatility (> 90%) are in the upper 90th percentile in overall effectiveness. Only 2 of the 59 managers (3.4%) who go a "B" on versatility (80-89%) were in the 90th percentile. In other words, those who get an A on versatility are over 10 times more likely to be at the top of the class in overall effectiveness. And 24 of the 26 individuals (96%) who got "Cs" and lower on versatility (< 80%) were in the bottom half of the class in overall effectiveness.

Taken together, these statistical results offer strong support for the idea that the LVI provides

feedback about behaviors that are crucial to managerial success.

Future Research

Of course, validation research is a never-ending process. There is always more to be learned about how a measure works, what it is and isn't related to, and how best it can be used. We are very open to collaborating on research with the LVI. For example, on studies of the link between personality, tenure, and job experience and the LVI; using the LVI to predict a range of hard and soft criteria in organizations like unit productivity and performance, role ambiguity and role conflict, turnover, job satisfaction, organizational climate and culture, customer service; the links between the LVI and other measures of leadership and their relative contribution to predicting organizational outcomes; the impact of developmental interventions on increased versatility; and so on. For more information on collaborative research, contact Rob Kaiser at [email protected] or 336.288.8200. Research References Kaplan, R.E. & Kaiser, R.B. (2003). Developing versatile leadership. MIT Sloan Management

Review, 44(4), 19-26. Kaplan, R.E. & Kaiser, R.B. (2003). Rethinking a classic distinction in leadership: Implications

for the assessment and development of executives. Consulting Psychology Journal: Research and Practice, 55(1), 15-25.

Kaiser, R.B. & Kaplan, R.E. (2001, March/April). Leadership effectiveness hangs in the balance.

Leadership in Action, 21(1), pp.12-13. Kaiser, R.B. & Craig, S.B. (2001). Leadership Versatility Index™ technical report: Item

selection and validation of Forceful-Enabling scales. Greensboro, NC: Kaplan DeVries Inc. Kaiser, R.B. & Kaplan, R.E. (2000, April). Getting at leadership versatility: Theory and

measurement of the forceful and enabling polarity. Paper presented at the 15th annual meeting of the Society for Industrial and Organizational Psychology, New Orleans, LA.

Kaplan, R.E. (1996). Forceful leadership and enabling leadership: You can do both. Greensboro,

NC: Center for Creative Leadership.