Today’s professionals in business, engineering, and science work in complex—often overwhelmingly complex—environments. To be effective, they must understand masses of data from a variety of sources. Traditional Nat mining tools employ blackbodies algorithms that generate complex predictive models. The models can be useful for predicting, but provide little to no insight into the data. Exploratory data analysis—the underlying premise of Data Desk software—is a statistics approach that allows the decision maker to not only see patterns and relationships in a dataset but to get at the causes and effects behind the relationships. EDA facilitates sophisticated understanding of what’s really going on in a body of data.
Most data arise as a byproduct of other activities. A business person may have data in a spreadsheet intended for tracking sales, data in a database for human resource management, or data that have been published by a government or trade organization. A researcher may collect data to sift a variety of alternatives, may want to look in a new way at data originally collected for a different purpose, or may want to check experiment data for errors or unexpected patterns. That is why the process of analyzing data needs to be wide open to possibility.
About a hundred years ago, in its early days, statistics concentrated on analyses of data, considering effective ways to describe patterns, trends, and relationships. In the middle half of the 20th century, attention moved to developing a solid mathematical foundation, establishing the properties of various estimators to find the best methods. In 1962 Dr. John Tukey warned that mathematical statistics was ignoring real-world data analysis and called for a return to scientific statistics in which the value of the statistical description of the data was paramount. In subsequent work, Tukey defined Exploratory Data Analysis, a philosophy that returned to the original goals of statistics but used modern methods.
Traditional inferential statistics starts from a hypothesis, performs an experiment, and then tests the hypothesis. EDA starts instead from the data and asks what patterns, relationships, or trends they might hold. In recent years EDA has gained wider acceptance. A large part of this growth is due to the availability of desktop computers and the explosion of data for which traditional statistics is just not suitable. Desktop computers have also made it possible to develop new graphical methods that support the EDA philosophy in strikingly effective fashion.
Because EDA relies heavily on data display, makes few assumptions about the structure of the data and emphasizes identifying and describing patterns, it is useful to a wide range of professionals who can recognize important patterns easily, but may not wish to work with complex statistical techniques.
Data Description’s graphical analytical tools start from the EDA philosophy. They empower people who have data and want to discover the patterns hiding within.