This is a template for doing a 2x2 contingency table just by entering in the counts into the cells n11, n12, n21, and n22. No need to enter 1's and 0's for everything or use the Replicate command.

Matthew C. Hutcheson

This template demonstrates the random nature of confidence intervals. It generates random samples from a variable and plots confidence intervals for the variable mean generated by those samples.

Chris Noble

Cronbach's Alpha is a statistic that measures the reliability of tests, observations, experiments or measurements by estimating the extent to which the tests, observations, experiments or measurements provide the same results on repeated trials. Cronbach's Alpha is a value between 0 and 1. Values near 0 indicate low reliabilty. Values near 1 indicate high reliability. (See Cronbach, 1951 and Carmines and Zeller, 1979).

Data Description, Inc.

The density plot assigns a grid to the scatterplot and counts the number of points in each cell of that grid, then displays them three-dimensionally with the count as the third dimension. Adding lines to this three-dimensional plot creates a fabric-like mesh across the plot revealing the density of "overstrikes" (points piled on top of one another) and clusters in the data. The user has control over the underlying scatterplot grid. Use finer grids on larger datasets to separate out the smallest clusters within the data. The plot is also colored so peaks turn red while valleys are green or blue. Combined with the ability to rotate, this graphical display is wonderful for presentations and revealing patterns in the data.

Matthew C. Hutcheson

Computes the Durbin-Watson statistic for serial correlation.

Matthew C. Hutcheson

Exponentially-weighted moving average (Ewma) solved using iteration. Given scalar 'alpha' (between 0 and 1) and vector 'X', this template solves for L, such that: L(t) = (alpha*X) + (1-alpha)*L(t-1) L(1) = X(1) It solves this by looping through all the cases and appending the results.

Paul Pratt

Exponentially-weighted moving average (Ewma) solved using iteration. Given scalar 'alpha' (between 0 and 1) and vector 'X', this template solves for L, such that: L(t) = (alpha*X) + (1-alpha)*L(t-1) L(1) = X(1) It uses iteration if alpha is closer to 1.

Paul Pratt

This template performs the standard F-test for equality of population variances. Given a sample s1 of size n from population p1 and another sample s2 of size m from population p2, the ratio (svar1/svar2)/(popvar1/popvar2) has an F distribution. This template computes the sample standard deviations and then compares their ratio to an appropriate F distribution (specifically one with n-1 and m-1 degrees of freedom, respectively). It then reports the p-value of obtaining a result as extreme or more extreme than the result obtained given that the chosen null hypothesis is true.

Data Description, Inc.

Use this template to compute a Fourier Transform of any size real data sequence.

Matthew C. Hutcheson & Paul Pratt

The Globe 3D template takes latitude and longitude (in degrees) as input and creates a three-dimensional rotating globe. This plot provides an exciting look at geographic data. Furthermore, you can add colors by variables of interest to reveal interesting patterns in the globe. Authored by: Unknown Modified by: Matthew C. Hutcheson

The index plot is often used when plotting distance measures such as Mahalanobi's distance or Hadi's distance. Each observation has a line drawn from it's value to zero. This creates a series of vertical lines. Using time series as the x-axis, then dips or gaps or peaks in the sequence of verticle lines reveals places where something interesting may be happening in that time series. Experiment with this plot in which the y and x are the residuals and the predicted values from a regression or linear model. You may be surprised with the patterns revealed.

Matthew C. Hutcheson

Use this template to perform the popular Kolmogorov Test for normality on a sample.

Data Description, Inc.

Use this template to perform the nonparametric Kruskal Wallis test.

Data Description, Inc.

The Levene Test tests for nonconstant variance across groups.This template computes, for each measured value, the absolute value of the difference between the measured value and the group median.

Data Description, Inc.

This template provides a complete generalized linear model for binary dependent variables. It expects grouped data, meaning one case for each covariate pattern. This template uses the Iterated Reweighted Least Squares method.

Walter Linde-Zwirble

The 'logistic regression - ungrouped' template is a re-tooling of the 'logistic regression - grouped' template. In this template the data do not have to be grouped. It requires a binary dependent variable and can analyze any number of factors. This template offers a selection of link functions and computes a wide range of statistics including model likelihood, deviance, dispersion, Hosmer-Lemeshow and ROC.

Data Description, Inc.

A fun and interesting build of the famous Mandelbrot Set fractal.

Walter Linde-Zwirble

The Mantel-Haenszel template computes the Mantel-Haenszel statistic and the corresponding Chi-Square based p-value with one degree of freedom. The MH test is used for combining several 2x2 contingency tables to obtain one test statistic and one p-value. For example, the MH test is often used to combine several clinical trial results together into one larger test. For the technical folks, the user has control over the correction factor and the variance component and thus can obtain results based on Mantel and Haenszel (1959), Cochran (1954) or Grizzle (1967).

Matthew C. Hutcheson

This template plots up to six dependent variables as functions of a single independent variable in one graph, and allows for easy rescaling of the vertical axis for each variable. This extends the multiple line plots in several ways. Relationships between variables measured in entirely different units or scales can be examined without creating derived variables, and the independent variable can have uneven intervals (it is not just the case number), and, in fact, need not even be sorted. It is meant to be useful in biological time-series data which often have measurements unevenly spaced in time.

Chris Noble

The parallel coordinate plot is designed to view multidimensional data. Each variable (up to 8 here) is standardized to [0,1] and then plotted as several side-by-side dotplots. Lines connect each case in one variable (or dotplot) to it's corresponding case in every other variable. In the past few years, this plot has become quite popular. A fascinating spirograph-like patterns appear if you plot two sorted random normals (one sorted ascending and one descending) or Cauchy. Note: You can generate Cauchy by taking a Normal(0, 1) divided by another Normal(0, 1).

Matthew C. Hutcheson

The Quadwise plot is designed (in an attempt) to view four dimensional data. This template plots two y-variables and two x-variables. y1 vs. x1 is plotted on the left-hand side of the "quadwise" scatterplot, and y2 vs. x2 is plotted on the right-hand side. Lines connect each case in the left hand scatterplot with its corresponding case in the right hand scatterplot.

Matthew C. Hutcheson

This template demonstrates regression confidence intervals. One somewhat difficult topic in teaching regression is explaining to students why confidence intervals for regression lines are hyperbolic in shape. This template allows one to visualize the process. The user has control over the sample size, the number of samples, the amount of error variance and heteroskedasticity (non-constant variance). Adjusting the error heteroskedasticity reveals hyperbolic shapes with narrow cones on one end and fat cones on the other end. Increase the sample size to get tighter intervals.

Matthew C. Hutcheson

This template demonstrates the empirical sampling distribution of a sample mean. This can be used to demonstrate the Central Limit Theorem without the restriction of sampling from a uniform population. This template also demonstrates the empirical sampling distribution of the difference between means. It can be used to test the hypothesis of equality of means through resampling rather than parametric methods.

Chris Noble

This template draws prediction and confidence interval bands for a simple regression of Y vs. X on a scatterplot of the data. It also calculates the exact endpoints of these intervals for a user-defined X-value. The user has control over the confidence level, the X-value for the calculated interval, and the color of the lines on the interval plot

John H. Walker

This program illustrates the Central Limit Theorem. A random uniform variable with a given number of cases is generated. Its mean is computed and appended to the end of a variable which is plotted in a histogram and probability plot.

Paul Pratt

This template draws a scatterplot and the user has control over the best-fitting line. If the user changes the intercept and slope, then the line automatically moves and summary statistics are automatically computed and updated. This gives insight into minimizing SSE and R2. The user also controls the error structure to see its effect on the regression and the scatterplot. Finally, the user can hit a button to automatically minimize the best-fitting line using least squares (sum of squares) or sum of absolute error.

Matthew C. Hutcheson

This graphical display contains two different functions. It is an updating scatterplot as well as a scatterplot snake. The updating scatterplot was developed here in Data Desk (a paper is currently being written). Plot y vs. x. Another "ordering variable" determines which data is displayed at any one time. This ordering variable is converted to be from [0, 1]. Control the data that is displayed using a 'location' and 'bandwidth' parameter. For example, if you have 100,000 observations, plotting them all at once is just a mess. You can use the density plot discussed above and use this plot to get an understanding of the large dataset. For example, you might set the location slider to 0 and set the bandwidth slider to 0.05. Then, slide the location slider from 0 up to 1. As you move the slider, the plot continually updates and only displays the points between location +/- bandwidth as determined by the ordering variable. Initially, the data between 0 and 0.05 are displayed. Once you get to, say, location = 0.50, then data is displayed that lies between 0.45 and 0.55 of the ordering variable. In other words, the middle 10% (55-45) of the data is displayed on the plot. It is useful to use random numbers as the ordering variable. Then, as you move the location and bandwidth parameters, you get a basic unstructured view of the data. Then, replace the random variable with a "real" ordering variable (say income) and update through the data. A scatterplot snake is also programmed into this plot. Displaying lines dynamically as you move through the ordering variable. This implementation is much more powerful that other programs because you have control of both the location and the bandwidth instead of the starting the snake and letting it go to the end. If you want to do that, just set the location = 0, then increase the bandwidth from 0 to 1. I like to set the bandwidth to an amount that doesn't put so many lines on the plot that it is distracting, then use the location parameter to move through the data.

Matthew C. Hutcheson

Use this template to perform Vertical Calculations for datasets of any size.

Matthew C. Hutcheson

This is another version of the previous template dealing with Vertical Calculations.

Paul Pratt

This template is useful for displaying data geographically. It contains a database of latitude and longitude associated with five digit zip codes for the continental US. If you have a variable that contains 5-digit zip codes, drop it into the "socket" and click on the button named Display Map.

Matthew C. Hutcheson