Data Desk FAQ

If you are a registered user of Data Desk, you are entitled to free technical support. We recommend that you consult your manuals before contacting us, as many of the answers to the questions we receive can be found there. You may also want to peruse the FAQs below, which cover a variety of questions from general to technical.

Contact Us to submit an issue to the Data Desk Technical Support Team.

Technical FAQs

Data Desk 8 for Windows can be used with computers running Win 7 up to Win 10. Data Desk 8 for Mac runs on OS X 10.7 up to 10.12.

There is a way to bring Data Desk 8.exe out of quarantine:

1. In the Security History window, in the Quarantine view, select the item that you want to restore.
- Click Options.
- In the Threat Detected window, click Restore.

2. In case of non-viral threats, you can use the Restore & exclude this file option. This option returns the selected Quarantine item to its original location without repairing it and excludes the item from being detected in the future scans.

3. In the Quarantine Restore window, click Yes.

4. In case of non-viral threats, you can use the option that is available in this window to exclude the security risk. Norton AntiVirus does not detect the security risks that you exclude in the future scans.
- Click Close.

Apple has introduced new security features.

You can add Data Desk 8 as an exception to this rule by holding control and clicking the Data Desk 8.app file.

From there click open. A pop up will appear asking if you are sure you want to open this app, click open and Data Desk 8 will be added to your exceptions list for your system's security settings.

You can read more about this from Apple support: http://support.apple.com/kb/PH14369

If you receive an error when you try using the menu option Calc > Nonlinear Models, you need to install the library folder, which can be downloaded here on our User's Forum along with a step by step fix.

Data Desk has a theoretical limit of about 2 billion cases. We don't know of anyone who has hit that limit but we do know of users who have analyzed datasets with over a million cases. The most important factors affecting performance are the speed of your computer's processor and the amount of available RAM. Most computers should be able to analyze comfortably 25,000 datapoints.

You can use the Data Desk demo to quickly check how Data Desk performs with large datafiles on your system. Launch the Data Desk demo and choose Generate Random Number form the Manip menu. Type the number of desired variables in the first field - about 5 is usually sufficient - and the number of cases in the second field - go ahead try a big number like 20,000 or 30,000. Press the OK button. Data Desk generates the random variables and opens the window holding the variable's icons. Now use the variables to create some plots or tables. Try a scatterplot, a regression or even a 3D rotating plot.

One method is to save the spreadsheet file in text or ASCII format. Most spreadsheet programs offer an option to change the format of the saved file in the Save dialog. Once the data have been saved into a text file, launch Data Desk and choose the Import command from the File menu. Use the Import dialog to find and open the text file. Data Desk offers to use the first row of the text file as the variable names. If the first row of the spreadsheet did not hold the variable names, you can choose other options for naming the variables.

The other option is to copy the data from the spreadsheet and paste it into Data Desk. Select the data in the spreadsheet file, making sure to include the column names. Launch Data Desk and choose Paste from the Edit menu. Data Desk offers to use the first row of the copied data as the variable names. If the first row of the data did not hold the variable names, you can choose other options for naming the variables.

Before the year 2000, the Macintosh interpreted all two-digit dates with a year after 11 as years in the 20th century, and two-digit dates before 11 as years in the 21st century. Now, however, the two-digit cut-off year is 91, which may throw off date calculations on data collected/entered before 1991.

The best solution to this problem is to work with dates stored with four-digit years. One easy way to do this is to create a new derived variable and type in:
    IF Right(
your date variable, 2) < cut-off year THEN
    Left(var, Len(var)-2) & "20" & Right(var, 2) ELSE
    Left(var, Len(var)-2) & "19" & Right(var, 2)

(Chapter 13 in the Data Desk Handbook)

Subset analysis is performed with user-defined indicator variables called Selector variables. Selector variables may be assigned in a variety of ways. The most direct method of applying a selector variable is to drag the selector variable into the analysis you want to restrict. All Data Desk analysis tables and some plots allow selector variables to be dragged into them.

Another way to apply a selector variable to a display or table is to select the icon of the selector variable and choose {Selector} Assign Selector from the plot or table's HyperView menu. When you assign a selector, take care that only the selector variable's icon is selected.

Selector variables can be assigned using a Selector button. Select the selector variable's icon and choose {Special > Selector} Assign. Data Desk creates a selector button, which appears in the lower left corner of the Data Desk window. Initially it is turned on (highlighted). Click the Selector button to toggle it off and on. When the Selector button is highlighted (on), all Data Desk commands operate only on the cases marked as 1 in the selector variable. After a command is executed, the button turns off. Press the button again to highlight it and invoke selection for the next command.

(Chapter 15 in the Data Desk Handbook)

Layout Windows can be used to print multiple plot and tables on the same page. To create a new layout window select {Data > New} Layout. Data Desk creates a new layout window and opens it. To place pictures of plots or tables in the layout window drag the icon (or icon alias) of the plot or table window into the layout window. Alternatively, choose the {Edit} Copy Window command to copy a picture of the window, click on the layout window and choose {Edit} Paste. If you drag the icon of an unopened window into a layout window, Data Desk creates a button that links to that window -- when the button is pressed the window opens. To reposition a picture in the layout window, click on it and drag it where you would like it. Plots in layout windows are transparent so you can overlay several plots.

When you add a picture to a layout window, the date and time are automatically recorded as part of the title. Such documentation helps to track the history of your analysis and provides a type of audit trail. You may hide this title by clicking on the HyperView menu triangle that is shown at the left of the title and choosing 'Hide Title'. If you choose 'Remove Link' it will remove the entire link and make it a static picture. However, this command cannot be reversed!

Data Desk also lets you to type or paste text into layout windows. When a layout window is frontmost, pressing any letter, number or symbol key on the keyboard or pasting text creates a text box within the layout window. This allows you to make personal comments on what you have observed during each stage of your analysis.

When the selected factor in the Results panel of the Linear Model design view is a two-way interaction, Data Desk offers an interaction plot from a HyperView menu for the title of the Expected Cell Means table.

First, select the interaction term from the Result for factor panel. Next, click to the right of the phrase: "Expected Cell Means of:" and choose Interaction Plot of 'dependent variable' by 'interaction term' from the context-sensitive HyperView menu.

An interaction plot is a dotplot of the expected cell means by the categories of one of the main effects in the interaction, with lines added by group to connect points according to the other term in the interaction.

Select the two variables that you would like crossed. Choose {Manip > Transform > Misc} Cross. A new variable will be created titled 'Cross'. Select this variable and go to choose Frequency Breakdowns from the Calc menu. This will give you the total count for all of the possible combinations of the two variables.

To move any Data Desk window that does not have a title bar (for example, Note, Picture Button and Socket windows), hold down the Option key on Mac, (Ctrl key on Windows), click anywhere on the window and drag the window where you would like to place it.

To resize any Data Desk window that does not have a title bar, hold the option key on Mac (Ctrl key on Windows), click and drag the bottom right corner of the window to the size you would like.

Because Data Desk's nonlinear modeling command uses templates programmed in the internal Action Programming Language, selector and group buttons have no effect analyses computed using this command. Several additional steps need to restrict nonlinear models to a subset of points.

If you are entering your own function (Custom), follow the steps below.

1.) First you should create your selector variable. For this example call it 'Selector'.

2.) Open the Nonlinear Model custom template. Press the "Change Loss Function" button. A new window titled "Loss Function" will open. Within that window you can edit the text in the "Loss Fn" window. Change the text to the following:

ssq('resids' for 'Selector'=1)

This tells the ssq computation to restrict itself to the 1's that are in the selector variable.

3) Open the results and choose Show Plot Info from the top scatterplot's HyperView menu. Drag Selector into the selector line.

If you are using one of the pre-built Nonlinear models, there are a couple of extra steps.

1.) Create your selector variable. (For this example name it 'Selector')

2.) Press the Open Results button.

3.) Click on the word 'sumsq' in the Coefficients & Sum of Squares window. Choose 'Locate Sumsq' from the HyperView menu. Data Desk finds and selects the derived variable that computes the sum of squared residual. Open this derived variable.

4.) Edit the text after "for" to say:

(numeric('Y') and 'Selector' = 1)

For example the Exponential Fit template should look like the following:

ssq('ypred'-'Y' for (numeric('Y') and 'selector' = 1))

5.) Open the results and choose Show Plot Info from the top scatterplot's HyperView menu. Drag Selector into the selector line.

There are two options available for computing a repeated measures analysis. One is using the multivariate ANOVA, the other is a nested design form.

To compute the analysis using the multivariate ANOVA option, your repeated observations must be entered as separate variables. For example, day 1, day 2, day 3, and so on. You need to have a variable that records treatment type for each subject in the study where each subject is a row in your dataset.

Select your observations as Y variables and your treatment as an X variable. Choose Calc > Linear Models. Data Desk opens the Linear Model design view. Click on the button next to "Type of analysis:" that says MANOVA and select Repeated Measures from the pop-down menu. To compute and view the results click the arrow next "Results" to open the Results panel inside the Linear Models design view.

Repeated measures can also be computed using a nested form. You must have one variable that records observations, a variable that records the corresponding treatment, a subject variable, and the repeat variable, which names the repeats. (See pages 29/8 and 29/9 for a schematic representation and an example.)

Select the observations variable as Y and the other three variables as X. Choose Calc > Linear Models. Data Desk opens the Linear Models design view. In the Factors panel, nest the Subject factor inside the Treatment factor.

Next open the Interactions panel (click on the arrow next to "Custom Interactions") and specify Treatments*Repeats interaction term. To compute and view the results click the arrow next "Results" to open the Results panel inside the Linear Models design view.

Advantages of multivariate repeated measures:

* Easier to specify.

* Faster and smaller; may be able to compute under memory limits when nested form cannot.

* Offers dotplots of responses in repeat order with lines connecting subjects; a useful diagnostic display.

Disadvantages of multivariate repeated measures:

* Less flexible; can't omit interactions.

* Can't compute expected cell means, coefficients, or post-hoc tests.

* Cases that miss even one Repeat are omitted from the analysis.

* One Repeat factor only.

Advantages of the nested calculations:

* Greater flexibility; can omit interactions.

* Can compute expected cell means, coefficients, and post-hoc tests for all terms.

* Missing observations are omitted only for the repeat on which they are missing; the subject can be kept in the analysis.

* Multiple Repeat factors are possible.

Disadvantages of the nested calculations:

* More complex to specify; may require data manipulation to put variables in the correct form.

* Slower and larger. May have difficulty completing the calculation for large files without a large amount of memory.

Sometimes data come to us that have already been summarized. So, for example, instead of getting a file with a row for each individual, you might receive a file with two rows - one row holding the number of males and the other the number of females. Data Desk always uses data that have not been summarized for its analyses. The Replicate Y by X command coverts summarized data to individual records so that Data Desk can use the data in other analyses

Replicate Y by X creates a new variable which repeats categories in Y the number of times listed in X. For example, suppose that you had two variables, each with two cases -- one called sex and one called replicates. The variable sex contains the text string 'male' in the first case and the text string 'female' in the second case. The replicates variable hold the value 13 in the first case and 15 in the second case indicating 13 males and 15 females. If you select sex as y and replicates as x and choose Replicate Y by X from the Manip menu, Data Desk creates a new variable called sex:replicates, holding 28 cases: 13 cases of 'male' followed by 15 cases of 'female'. This variable can be used to create bar charts, frequency tables and as factors in linear models.

Date and Time Functions are calculated with the base date of Jan 1, 1904. For example if you typed "Days(1/1/93)" it would return a value of 32509. Which equals the total number of days from January 1, 1904 to January 1, 1993.

There are differences in the date and time functions between Mac and Windows regarding the interpretation of two digit year notation. (i.e. 93 instead of 1993)

On the Macintosh, any two digit year between 11 and 99 will be translated to 19XX, but any two digit year between 00 and 10 will be translated to 20XX. Therefore, dates before 1/1/1911 and after 12/31/2009 must include the century digits (2011 as opposed to 11).

On the Windows it depends what oleaut32.dll file you have in your system folder. With one of the oleaut32.dll files the results are the same as on the Mac. If you type "Days(1/1/30)" and it returns the value of 2558 then you have one that is comparable to the Mac scenario.

If your system uses the other oleaut32.dll file, any two digit year between 30 and 99 will be translated to be 19XX, but any two digit year between 00 and 29 is now assumed to be 20XX. Therefore, dates before 1/1/1931 and after 12/31/2030 must include the century digits. If you type "Days(1/1/30)" and it returns the value of 39083, then you have the later of the two oleaut32.dll files.

Unfortunately there is no easy way to determine which oleaut32.dll file your system is using. The best way is to experiment with several dates and see what values are returned.

To open a fixed format file launch Data Desk and choose " Open Datafile..." from the File menu. Use the Open dialog to find and open the fixed-format file. Press the "Use Fixed Format" button. Data Desk displays the first group of characters of the top row of the file. Type in the number of characters in the first variable, enter the variable name press the Next button. Continue this until you have defined all of your variables and then press the Done button.

To copy selected cases from a Data Desk relation, first open the variables that you would like copied for the specified cases. Be sure there is an editing sequence number in each variable you wish to include with the copy. The editing sequence is located at the top of the scroll bar on the right of each variable editing window. If there is a gray box instead of a number, click on the gray box. A number appears. This number represents the order that the variable columns will be in when you copy them. (Note: It does not matter what order the actual variable windows appear in on the screen, the order that they are copied in is determined by the editing sequence number.) After you have assigned the editing sequence number for each variable, select the cases by dragging and highlighting the cases in the editing windows or by selecting points in a plot or table. Choose Copy from the Edit menu. The cases are now ready to be pasted either into another window in Data Desk or into another program.

The Query tool displays the text or value in the frontmost variable window for the selected case. To have the Query tool display the values computed by a derived variable, open the derived variable and choose the "Show Numbers" command from its HyperView menu. Be sure the Show Numbers window is open and is the frontmost variable window, select the query tool from the Tool palette and click on the a point in any Data Desk plot that displays individual points.

There are two ways that you can do this within Data Desk. The first is by using Derived variables. Create a derived variable (Data>New>Derived Variable) and type the following expression:

If 'var' = 1 or 'var' = 2 then "small"

else if 'var'=3 then "medium"

else "large"

The second method requires a graphical selection of the cases you want to recode. First, select 'var' choose Bar Charts from the Plot menu. Open 'var''s editing window (double-click on the variable 'var'). Select the pointer tool from the tools palette and highlight the cases for categories 1 and 2 by clicking on the bars corresponding to those categories in the plot (hold the Shift key down to select the second bar). Click on the title bar of 'var' you are recoding, choose Replace from the Edit menu and type "small" in the Replace dialog. All the selected cases change from "1" or "2" to "small". Repeat the same process for all other categories you want to recode.

The difference between selectors and hot selectors is subtle, but important. Selectors are static 0/1 indicator variables. Once you define the 0/1 code for each case, it does not change unless you explicitly change the numbers in the variable or the derived variable expression, if your selector is based on a derived variable.

Hot Selectors are 0/1 indicator variables that are dynamic and are based on the selection state of the cases in the relation. The 0/1 coding for each case is determined by the selection state of that particular case at that particular time - highlighted cases are coded with a 1, unhighlighted are coded with a 0. You can change the selection state and, therefore the 0/1 code, by selecting new points in any plot or table.

With Hot selectors you can say "I want to see that regression, but only for the males" by simply assigning a HotSet selector to the regression and selecting the male bar in a gender bar chart or frequency table. Than you can say "Let me see that regression, but this time only use the female points" and you can accomplish the by clicking on the female bar of the bar chart. The general rule is that any plot or table that has a HotSet Selector assigned to it will recompute or redisplay anytime the selection state has been changed. The selection state can be changed by selecting points in plots, like a histogram or scatterplot, or a table, like a frequency table or contingency table.

To append data to an existing datafile, open the datafile that holds your existing data and choose Import from the File menu. Select the file that has your additional observations. (This could be another Data Desk file or a text file.) Follow the usual steps for importing data. Data Desk creates a new relation and place the new data in that relation.

Verify that the new data have the same number of variables as the original datafile. Select the existing data variables as Y's and the additional data variables as X's and choose Parallel Append from the Manip menu. Data Desk creates a new relation labeled Parallel Append which holds the new appended variables. The appended variables use the same names as the original variables (the ones selected as Y's).

Hint: If you don't see the Parallel Append command under the Manip menu, make sure that you have the same number of Y and X variables selected.

Selector buttons work with selector variables to restrict analyses to a specified subset of points. Group buttons work with variables that hold multiple categories and result in parallel analyses for each category in the categorical variable.

Most datasets are rectangular. There are variables (usually represented as columns) and cases (usually represented as rows). Each case has a value recorded for each variable. The recorded value may be a value defined as "missing" rather than a number or a category name. Because each case has a value for each variable and each variable has a value at each case, the array of data can be shown as a rectangular table of values as in a spreadsheet. In Data Desk this rectangular structure is known as a Relation.

A Folder helps to organize variables or results into groups so that you can deal with them easily. Several icons may belong together for the following reasons: they describe the same individuals or circumstances; they contain related quantities; you plan to use them together in analysis; or you want to group them together to clean up the desktop. In Data Desk any kind of icons can be grouped into folders for any of these reasons.

The Data Desk commands that compute hypothesis tests and confidence intervals for differences between sample means require that the samples being compared reside in separate variables. Many datasets use category variables to differentiate between samples, storing the measured values in one variable and the categories in another variable. To convert data stored using a category variable to separate variables for each category, select the variable holding the measurements as Y, the variable holding the groups as X and choose Split into Variables by Group from the Manip menu. Data Desk creates a new variable and a new relations one for each category in the categorical variable.

Hello Data Desk User,

You ask an interesting question.
Your question is really about data structures.

Datadesk, unlike most other statistics packages, has relational database functions, so it does understand the differences among the kinds of data you mention and has tools that can deal with them.

However, you must think very clearly about what you want.

First, a definition:
A relation is a data structure in which all variables are about the same cases. You can think of it as a rectangular data table in which rows are the cases and columns are the variables.

For example, quarterly data, such as GDP are in a relation in which each case is an economic quarter. S&P data are in a different relation with daily data.

All statistical methods that deal with multiple variables require variables that are part of the same relation. So ultimately, you’ll have to arrange for your variables to be in a single relation.

However, there are certainly times when variables in different relations can be related to each other. In data base terms, that requires a relational lookup. So, for example, although S&P500 is daily, we do know which quarter each day is in. So it is possible to look up in the Quarterly relation to get the GDP for that quarter. If you do that for each day, the resulting variable will be in the Daily relation, but will repeat the same GDP value (in effect) for each day in that quarter. If that is what you are looking to do, then Datadesk does have the functions you need.

Look at the Help files (or documentation) for Derived Variables and within that for the Relational Functions. You’ll find functions such as GetCase(y, x) which takes x as a list of case numbers (for example quarters) and returns the case in y at that case value.

Lookup(y, x) returns the case number of a case of y for which y = k.

Using these and other functions like them, you should be able to look up the GDP value in the quarterly relation by knowing the day number in the Daily relation (provided you can write the function that specifies the quarter number from the day number.) A derived variable that uses these operations could be a member of the Daily relation, but would hold the GDP value for the quarter to which each day belongs.

I always have to debug my use of these operations. A convenient way to do that is within a scratchpad, where you can type an expression and evaluate parts of it as you put it together.

Please let me know if this works for you and, of course, send any remaining questions along.

Best Regards,
Paul

Phone: 607-257-1000 | P.O. Box 4555, Ithaca, NY 14850
Copyright © 2017 Data Description, Inc. All rights reserved.

Privacy Policy