This paper explores methods for analysts to work with automated tracing tools to build a requirements traceability matrix (RTM). Two experiments were run using four and six simulated analysts which evaluated candidate links returned from an automated tracing tool using a publicly available dataset. The first experiment measured the number of candidate links that the analyst had to examine before the traceset reached 88.9% precision. The number of candidate links observed is a measure of the effort required of the analyst. The second experiment fixed the number of candidate links that the analyst will observe and measured the accuracy of the final traceset.
The first four analysts observed links in "local" or "global" order. Local order means that the analyst looks at the candidate links for one high-level element before moving to the next high-level element. Global order means the analyst looked at the candidate links in order of their relevance score. Local order analysts also stopped evaluating links for a given high-level element after it determines that the element was satisfied. The four analysts in the first experiment used local order with feedback, local order without feedback, global order with feedback, and global order without feedback. "Feedback" means that the analyst's decisions is fed back to the automated tool after each decision. The second experiment was run with the same four analysts as the first with the addition of two random analysts. The local order random analyst randomly selects 7-8 links from the set of candidate links for each high-level element. The global order random analyst selects candidate links at random from the all candidate links.
Analysis of the results leads to several observations. First, it is important to know when to stop. The local order analysts were programmed to know when a high-level element is satisfied. In the first experiment, local order analysts achieved 20% precision while only observing about 3% of the total candidate links, whereas the global order analyst achieved 6% precision while observing 10% of the total candidate links (less accurate with more effort). The local order analyst also had the best results for the second experiment. The second observation is that incorporating feedback reduced the analyst's required effort by 7-13%. The last observation is that it is important to have a system. In the second experiment, the analysts that evaluated candidate links in random order performed very poorly (4% precision and 18% recall).