Exploratory Data Analysis Used to Improve Software Reliability Models
(Excerpted from Scientific Computing and Automation, September 1997)
For statistician Wendell Jones the behavior of telephones and other communications devices cannot be taken for granted. Jones works in the R&D department of Northern Telecom, Ltd. (Nortel), a global provider of network solutions that include both hardware and software. He is based in the company facilities at Research Park, North Carolina.
"People think about telephone service as being as accessible as tap water," Jones says, "For us, reliability is the most important quality issue with customers."
[He] concentrates on the software Nortel uses in the proprietary systems it develops for public carriers. His daily task to anticipate outages that could potentially be cause by the software. He produces statistical models that describe possible failure and the conditions under which they might occur.
The data Jones uses to derive Nortel's predictive models report on such seemingly disparate factors as aspects of the software development process, the system's complexity, characteristics of the people who use it, and performance of related products. The dataset Jones extracts is what he must analyze to pinpoint the variables and identify the parameters he will use in his model.
With data in hand, he turns to Data Desk, [Data Description's visual statistical program]. He typically starts by creating a single plot, for instance, a scatterplot. This gives him the primary story, a display of basic patterns between any two variables. [He] then isolates primary variables in one or two scatterplots and creates bar charts and histograms for secondary variables. The relationships between the secondary variables are "subplots" that are not depicted in the scatterplot, but he suspects them of exerting some influence on the patterns that do show up there. He fixes these charts along the edge of the scatterplot, and when he clicks on points in one of the darts, these same points are highlighted in the scatterplot.
Displaying the primary scatterplot in combination with the related charts allows Jones to begin further testing to tease out the relationships among numerous factors in the software's behavior. He says the links between plots make it possible to sort through the complex factors that operate in software behavior.
Exploratory analyses of his data and their mathematical expressions give Jones some of the key elements for his predictive models. They also suggest the distribution of the risk of failure in various software performance scenarios.
Name: Wendell Jones
Location: Research Park, North Carolina