Login | Register
My pages Projects Community openCollabNet

Empirical Evaluation of
Cognitive Features

This chapter presents empirical studies of Argo/UML usage. The following five sections describe three laboratory usability studies, observations of classroom usage, and a summary of feedback from internet users.

My feature generation approach produces fairly independent features. I have demonstrated these features in the context of the Argo/UML tool, but they can be applied to other design domains and tools. To evaluate individual features, I have used controlled laboratory studies that focus on specific design tasks. To evaluate the overall tool, I have gathered and analyzed feedback from actual users.

Pilot User Study

Goals

The goals for the pilot user study were to practice the skills needed to conduct user studies and to resolve any serious usability problems that might interfere with further studies.

Setting

In June 1998, I conducted a pilot laboratory study on one of the first released versions of Argo/UML. This study was approved by the UCI human subjects research committee as study HS98*224. Two subjects were asked to use Argo/UML to complete the task shown in See Task for pilot study . Subjects were given brief demonstrations and instruction on using Argo/UML and then proceeded to use the tool as best they could. Subjects were carefully observed and the problems they encountered were noted. The tool was updated to address problems identified with the first subject before the second subject was observed.

Results

This study resulted in progress on both of its goals: my understanding of tool evaluation improved, and the tool itself was improved to help clear the way for further evaluation.

The biggest change in my evaluation plans resulted from the way subjects dealt with the problem statement. The problem statement given was meant to be somewhat complex and open-ended so as to give subjects a reason to make design decisions. However, it also provided enough concrete facts that subjects felt they needed to enter those facts into their designs before proceeding with any creative design work. Since there were many facts to enter, the entire testing session was taken up by transcription rather than design decisions. In fact, subjects used the problem description sheet to check off design elements as they entered them. As a result, later laboratory studies focused on smaller tasks involving individual design features rather than large design tasks involving the whole tool.

The second change in evaluation plans resulted from the failure of subjects to effectively follow the think-aloud protocol. Initially, I had hoped that subjects would report their own thought processes and that I could use that data to measure the perceived complexity of their task and identify any difficulties. As it turned out, subjects lapsed into silence whenever they encountered difficulty.

The third major realization about laboratory user testing was that Argo/UML simply was not ready to be evaluated as a complete tool. At that time, too many missing features, basic usability problems, and outright defects made testing for subtle advantages impossible. All of the problems detected in the pilot study have now been addressed, but the emphasis remains on evaluating individual features in laboratory studies and evaluating the overall tool via interactions with actual users.

Two major improvements to Argo/UML resulted from the pilot user study. First, the need for direct text editing in the diagram pane was seen to be of key importance. At the time of the pilot study, direct text editing was not supported: subjects needed to select design elements from the navigator pane and edit their properties in the "Properties" tab. Now, Argo/UML users may edit names, attributes, operations, and state transitions directly in the diagram pane. Second, the need for clarifiers became obvious because subjects worked through an hour of design construction without ever switching mental modes to reflect on the design. The theory of reflection-in-action indicates that designers will periodically switch between reflection and construction, but there is no reason to believe that they will do so unaided within a one hour laboratory session. Clarifiers appear directly in the diagram editing pane and visually prompt designers to consider feedback from critics during design construction.

Broom User Study

Goals

The goal of this second laboratory study was to evaluate the ergonomic and cognitive impact of the broom alignment tool. This study also served to test the usefulness of a technique for measuring short-term memory load.

Setting

In January 1999, Michael Kantor and I conducted a laboratory study that compared the broom alignment tool with standard alignment commands. This study was approved by the UCI human subjects research committee as study HS98*552. We later published a conference paper that reported the results of this study (Robbins, Kantor, and Redmiles, 1999).

In this study, subjects were asked to position diagram nodes into visual groups to reflect various semantic groupings. Nodes were initially placed near the top of the diagram in no particular order or grouping. For each diagramming task, subjects worked once with the broom and once with the standard alignment tools, in random order. This allowed us to compare, for each subject, whether the broom or standard commands were better for the task. Ten subjects each repeated this with three separate diagrams. One of these diagrams is shown in See Desired groupings of diagram elements .

On the second and third diagrams, we tested the short-term memory of our subjects to see if the memory load was greater for one tool than for the other. Before each diagramming task, subjects memorized a set of six random, two-digit numbers, and at the beginning and end of each task they were asked to recall the numbers. The expectation was that more numbers would be forgotten when using a tool that requires more short-term memory to use. This technique for measuring short-term memory load was inspired by a recently published experiment (Byrne and Bovair, 1997).

Results

In all thirty trials, the mouse was moved a greater distance when using the standard tools than when using the broom. On average, the mouse was moved 86% farther when using the standard tools. This was largely due to movement to a toolbar of alignment buttons at the top of the drawing area. In contrast, control-drag was used to invoke the broom. This difference would be reduced if keystrokes were assigned to each alignment command; however, that would require eight new keystroke bindings and may force users to move their hands between the mouse and keyboard more.

See Mouse dragging with the broom or standard alignment tools charts the distance that subjects dragged the mouse. Since the broom involves dragging the mouse and dragging can be relatively difficult, we were concerned that the broom might be more physical tiring. However, over all trials, subjects dragged an average of 12,592 pixels while using standard tools and only 10,809 while using the broom, which is 16% shorter. Using a paired t-test, we found the difference to be significant with P < 0.003. A large part of the dragging needed for standard tools was done while dragging out selection rectangles. The shorter dragging distance for the broom resulted largely from the fact that objects do not need to be explicitly selected before they are aligned with the broom.

Achieving layouts that show grouping and correspondence requires planning: performing alignments in the wrong order can force users to undo previous work. Since using the broom involves fewer planned steps, we expected a lower short-term memory load when using the broom. In fact, the majority of subjects indicated that they found the broom more "natural." However, we found no significant difference in the short-term memory effects of the tools compared. We believe that our test for short-term memory load was not sensitive enough to detect the differences between the tools. In fact, subjects recalled all numbers perfectly in twenty-six out of forty tasks. This led to a refined version of the short-term memory load test in the next study.

Construction User Study

Goals

The goal of this user study was to evaluate the support provided by Argo/UML's selection-action buttons. In particular, this study focused on measuring the match between designers' diagram construction tasks and the user interface affordances provided by selection-action buttons.

Setting

This study was approved by the UCI human subjects research committee as study HS99*1210 and was carried out in August 1999. The study consisted of five subjects performing prespecified diagram construction tasks under two conditions. Under one condition, subjects used standard diagram construction toolbar buttons. Under the other condition, subjects were encouraged to use the selection-action buttons for most of the construction. As in the earlier broom study, each subject was asked to do each task twice: once under each condition, in random order. Short-term memory load was also measured using a refinement of the technique used in the broom study.

Results

The subjects in this study were all able to accomplish the diagramming tasks with either the selection-action buttons or the standard toolbar. Only one subject had a difficult time with the selection-action buttons. The primary difficulty was in making the buttons appear rather than actually using them. Surprisingly, many of the subjects formed mistaken assumptions about the action needed to get the selection-action buttons to appear. Also, all but one of the subjects used each of the three aspects of the selection-action buttons. Since the task was a transcription task rather than a design task, subjects tended to work systematically from upper-right to lower-left rather than expanding on logical clusters.

Subjects made more mistakes on the unconventional diagramming task than on the conventional one. For example, several subjects mistakenly used association links rather than horizontal generalizations. No subject, however, made the same mistake more than once, and several said that they understood that the directions of the lines were "misleading" on this task. Bent edges caused users to click on the first vertex of the edge, resulting in placement of a new node. Subjects also made several mistakes when using the standard toolbar buttons, including using the wrong type of edge and forgetting the current mode. Some users double-clicked in the toolbar to lock in a node creation mode and then accidentally created extra nodes while attempting to reposition existing nodes.

The study results suggest three possible refinements to the selection-action button feature. First, the buttons should appear and stay visible whenever a node is selected, even if the node has been slightly moved during selection or if the mouse has been moved away from the node. Second, when dragging to specify the location of a new node, the mouse coordinates should be used as one corner or the center of an edge rather than the center of the node.Third, the dragging behavior of selection-action buttons might be modified to allow the creation polygonal edges by clicking in empty space to add a vertex and using double-click in an empty space to create a new destination node.

Classroom Usage

Goals

The goal of this empirical evaluation is to gather anecdotal experience from classroom usage at UCI and other universities. One of my reasons for developing Argo/UML was to produce a freely available tool that could be used in university classrooms and that would help teach good design. Actually aiding the teaching of software design skills requires much more than simply using an educational license for a standard commercial UML tool. Argo/UML's strong basic usability and cognitive support features can help address the needs of users who are new to UML and object-oriented design. In addition to actually using Argo/UML, several students have made contributions to Argo/UML's development as part of project courses, independent study courses, or research projects.

Setting

Argo/UML, Argo/C2, and GEF have been used in several courses at UCI and at other universities. See Known classroom usage of the Argo family summarizes classroom usage of these tools. The uses outside of UC Irvine were found by searching for "Argo/UML" and "GEF" on popular web search engines and by reviewing email messages sent from university users of GEF and Argo/UML.

Known classroom usage of the Argo family

School

Date

Course

Usage

UC Irvine

Summer 1999

Student research project

UC Irvine

Spring 1999

ICS 227

Integrated Argo/C2, Argo/UML, and other tools

UC Irvine

Spring 1999

ICS 125

Enhanced Argo/UML

UC Irvine

Spring 1999

ICS 121

Class used Argo/UML

UC Irvine

Fall 1998

Independent study

Enhanced Argo/UML

UC Irvine

Fall 1998

ICS 125

Class used Argo/UML

UC Irvine

Winter 1998

ICS 125

Enhanced GEF

UC Irvine

Winter 1998

ICS 125

Enhanced Argo/UML

UC Irvine

Fall 1996

ICS 125

Developed initial version of GEF

U. Twente, The Netherlands

1999

Student research project

U. Waterloo, Canada

1999

Student research project

U. Mulhouse, France

1999

Survey of UML tools

U. Frieberg, Germany

1999

Student research project

U. Vrije, Brussels

1999

Master's thesis

U. Bologna, Italy

1999

Master's thesis

Oregon State

Spring 1999

CS 562

Grad seminar discussion topic

Syracuse U.

Spring 1999

GEF used in research project, mentioned in paper at Supercomputing `98

UC Berkeley

1999

Research project inspired partly by GEF

U. Macow, Macow

1998

Student project used GEF

UCLA

Winter 1997

GEF used in research project

Duke U.

Winter 1997

Student project used GEF

Syracuse U.

Fall 1997

CSP714

GEF used in project course

Purdue U.

Summer 1997

GEF used in student research project

North-Eastern U.

Summer 1997

GEF used in student research project

CMU

Summer 1997

GEF evaluated for research project

Results

Classroom usage of Argo/UML at UC Irvine has been a practical success in that students have been able to complete their assignments. In the Fall 1998 offering of ICS 125, individual students reported lost data, slow performance on machines in their team offices, and a few basic usability problems. None of the students reported dissatisfaction with the cognitive support features, which probably indicates that they were not frequently used. However, one of the goals of cognitive support is not to interfere with basic tool usage, and this goal seems to have been achieved.

In the Spring 1999 ICS 121 course that used Argo/UML, students were able to begin using the tool rapidly and complete their assignments. Very few problems of any kind were reported, and those that were reported mainly requested new functionality (e.g., a special kind of cut and paste) or environmental difficulties (e.g., printing in the CS 3rd floor lab).

Student projects that enhanced GEF and Argo/UML have generally been successful. Three ICS 125 projects have enhanced Argo/UML with new diagram types, and each of these projects has produced some code that is worthy of incorporation into the distributed version. Students involved with the projects have generally reported a feeling of satisfaction that they were contributing to a project that other students would continue to evolve and use. Undergraduate independent study course have produced several contributions, including a key part of Argo/UML's support for the XMI file format. Research projects by UCI graduate students have built on code in the Argo family to produce ArchStudio and Argus-I.

Internet Usage

Goals

One of the goals of my research is to have an impact on what CASE tool users expect from their tools and what CASE tool vendors provide. Research on design critiquing systems and other forms of cognitive support has been carried out for over fifteen years, yet most users of desktop applications have never encountered a system with critics. I have tried to transfer my ideas from the research environment to industry by demonstrating them in the context of a useful tool. As discussed below, measurable progress on this goal has been made, but it is far from complete.

Setting

This section presents anecdotal and statistical data on usage of Argo/UML by internet users. Here, "internet users" refers to people who downloaded Argo/UML and who are not educational users. These users were self-selected and their feedback was voluntary. Most of these users found the Argo/UML web site by searching the internet for the terms "UML" or "CASE", while others learned of the site from other users.

Whenever anyone downloads Argo/UML, they enter registration information that includes their email address. From July 1998 until January 1999, I followed up on each registration by sending an email message that stated "Thank you for your interest in Argo/UML," and asked "What is your interest in CASE tools?" After initial contact, I continued to receive feedback from many users. Since Argo/UML is not yet fully developed, users have often encountered difficulty that prompted them to ask questions, offer comments, or report bugs. This data is also voluntary and comes from self-selected subjects.

Results

The first and most surprising result is that Argo/UML's registered users include thousands of people from all around the world. This indicates that Argo/UML is at least reaching many of the potential CASE tool users whose expectations I seek to raise. See Number of new Argo/UML registered users by month in 1999 shows the number of new registered users each month.

One reason that so many users found the Argo/UML web site is that it is listed in many search engine databases and CASE tool index web pages. Searching for "Argo/UML" on leading internet search engines yields over one hundred hits on pages outside of UCI. These sites are hosted in well over a dozen countries, and they typically offer links to the Argo/UML home page and brief descriptions of the tool in English, German, French, or Japanese.

Between July 1998 and August 1999, I have received a total of one-hundred twenty-five bug reports on Argo/UML. The number of bug reports helps to define a lower bound on the number of people who have actually used Argo/UML. Typically, only one, two, or three bug reports are submitted by the same person, and a total of seventy-four distinct people have submitted bug reports. Of course, many more people have used Argo/UML without submitting any bug reports.

Some quotes from Argo/UML users

My company bought a CASE tool called ... COOL. Of course, users had less polite names for it. I'm not sold on CASE since I feel you spend more time working the tool than doing design, but I'm still open.

I was showing Argo/UML to my boss, and we were both impressed with the design guidance features. The idea of the checklists and critiques, especially with the future possibility of tailoring them for a user's specific strengths and weaknesses, seems to add to potential reliability of the produced models.

The selection-action buttons are really fantastic time-savers. I can't believe that no on thought of them before.

I have used Object Domain, PepperSeed,... I have never found a really good UML tool, so I am always on the lookout for... ease-of-use and speed.

I've used ObjectTeam, COOL, Rose, P-Plus. I think they are all detractive to UML. I'll try Argo/UML, as it seems to have a nice look and feel!

Argo/UML looks like a very promising product. I'm a software engineer currently using Rose. I am exploring alternative tools because it is too expensive to have a copy at home, and using it can be frustrating.

I especially love the critiques that pop up when the mouse hovers over a diagram element. Table views are an extremely nice touch.

See Some quotes from Argo/UML users presents some excerpts from email messages sent from Argo/UML users. Feedback from users has been uniformly positive about Argo/UML's cognitive support features. I have received very few negative email messages. One negative message complained about the need to provide one's name and email address during registration. Some other negative messages basically said that Argo/UML was not ready for use in their organization. I believe that the later type of comment occurred because Argo/UML's overall impression of commercial quality can lead users to forget that it is a research project.

The following observations are drawn from 783 email messages I received from 602 distinct Argo/UML users:

  • As with classroom usage, the majority of user complaints and bug reports identified missing functionality or outright implementation defects. No one has ever reported that design critics or any other cognitive support feature has interfered with their work.
  • Many comments came from sophisticated CASE tool users who had experience using several commercial tools, but more came from first time users who were starting to learn UML and object-oriented modeling. Comments from experienced CASE tool users often focused on the perceived difficulty of using tools like Rational Rose. Usability and subjective satisfaction seem to be key factors in whether CASE tools become shelfware. Developers new to modeling often stated that they were the first person in their development organization to use modeling and that they saw Argo/UML as potentially helpful in learning UML.
  • The zero cost of Argo/UML generated more enthusiasm than did the cognitive support features. Emphasis on cost is somewhat unexpected since CASE tool users typically do not pay for the tools themselves. Also, most software designers are highly paid, so marginal increases in productivity could result in savings much greater than the initial cost of the tool. Furthermore, much of the cost of CASE tool adoption is in training rather than tool price (Huff, 1992). All of these reasons should make CASE tool purchasers insensitive to prices. In fact, that price assumption is reflected in the state of the CASE tool market where tools typically cost $2000 to $6000 per seat. However, given the other user comments on dissatisfaction with CASE tool usability, it seems that price sensitivity may stem from the broader perception that CASE tools are not worth using.

One very interesting segment of the Argo/UML user population is made up of people who are employees of CASE tool vendors. I have had received email from employees of Rational (makers of Rose), Togethersoft (makers of Together/J), MicroGold (makers of WithClass), Object Insight (makers of JVision), and ObjectDomain (makers of ObjectDomain). In four out of five cases, these email contacts have been with lead designers or company presidents. For the most part, the email messages have briefly expressed interest in Argo/UML without stating any specific views on its cognitive support features. However, there have been two cases where Argo/UML features have influenced the features of commercial tools.