* SAS MACRO FOR SCORING THE IMPLICIT ASSOCIATION TEST ALGORITHM BASED ON GREENWALD, NOSEK, & BANAJI, 2003, JPSP CREATED BY: BRIAN NOSEK (NOSEK@VIRGINIA.EDU, HTTP://BRIANNOSEK.COM/) LAST UPDATE: 05/27/05 - VERSION 5 COMMENTS AND INSTRUCTIONS -------------------------------------- Most recent changes: - 5/27/05: Macro will now output the number of trials used to calculate the mean for each block - 5/27/05: Lots of updating to the introductory and explanatory text - 5/14/05: Macro will now analyze non-critical blocks (B3, B4, B6, B7) such as single dimension practice trial data and output their means, percent error, and percent fast responses - 5/14/05: Setting SUBEXCL to '2' (missing data) only occurs if there is missing data from the four critical analysis blocks -------------------------------------- This macro will transform a datafile with raw trial latencies (stored as one line per response) for a standard format IAT (7 blocks) into a one line summary per subject of the IAT effect using GNB's new scoring algorithm. The goal of this macro is to prepare IAT data for subsequent analysis. However, this does not relieve the researcher from making conceptual decisions about how best to analyze IAT data. There are decisions to make about how the macro is applied, and the macro does not remove participants. All subject exclusions must be made deliberately by the researcher. To use this algorithm for your SAS program, perform these steps: (1) Run this script (do not edit the macro directly). The macro will be loaded into active memory and can be referred to in any analysis script (2) Turn the datafile containing your IAT data into a SAS datafile and put it into folder to use as a library (3) Identify that folder as a library and a folder where the SAS file output from the macro will go (they can be the same folder) EXAMPLE (without semicolons): libname web 'H:\raceatt\' libname outdata 'H:\raceattclean\' (4) Prepare your SAS datafile to be used by the SAS Macro. The critical elements are: (a) If there are multiple IATs per participant, then the macro will need to be run for each IAT individually and the data can be merged manually afterwards (b) the dataset must contain one row per IAT trial (c) the following variables must be available in each row: subject identifier, trial latency, trial error (0=correct, 1=error), name of block = B3, B4, B6, B7 (corresponding to the 3,4,6,7 blocks in the standard 7-block IAT format) (d) the macro has no idea what task subjects were performing in the blocks, it just calculates the performance difference average(6-3, 7-4), you are responsible for knowing what those blocks refer to for each participant EXAMPLE (without semicolons): BLOCK_NAME is an existing variable defining what is in each block, BLOCK is the new variable that will be passed to the macro. if BLOCK_NAME in ('goodbad') then BLOCK = 'B1 ' else if BLOCK_NAME in ('bushkerry') then BLOCK = 'B2' else if BLOCK_NAME in ('bushgoodpractice') then BLOCK = 'B3' else if BLOCK_NAME in ('bushgoodcritical') then BLOCK = 'B4' else if BLOCK_NAME in ('kerrybush') then BLOCK = 'B5' else if BLOCK_NAME in ('bushbadpractice') then BLOCK = 'B6' else if BLOCK_NAME in ('bushbadcritical') then BLOCK = 'B7' Note: the IAT score will be based on B3, B4, B6, B7 only. If other blocks are included as in this example, the macro will calculate basic statistics: mean, error rate, number, fast responses. (5) In your SAS program, enter the following statement %iatCalc(libIn, libOut, INDATA, OUTDATA, BLOCNAME, SESHID, TRIAL_LATENCY, TRIAL_ERROR, VERROR, VEXTREME, VSD) (6) the variable names between the parantheses are placeholders, you will change those values to ones that correspond with the file and variable names in your own datafile, and the last three (VERROR, VEXTREME, VSD) will be replaced with numerical values (1 or 2) according to the definitions described below EXAMPLE (without semicolons): %iatCalc(web, outdata, iatrace, CLEANiat, BLOCK, SUB, LATENCY, ERROR, 1, 2,1) (7) run the script (8) examine datafile to find your calculated IAT scores (in the example, the file would be called outdata.CLEANiat (9) the macro does not remove individual participants, it only identifies IAT scores that are clearly problematic. The SUBEXCL variable is 0 if the IAT performance passes basic standards, 1 if more than 10% of trials in the four main blocks were <300ms, 2 if data from a main block was missing. Individual subjects will need to be removed by the researcher, and any additional exclusion criteria will have to be defined and implemented by the researcher. The macro is not designed to make conceptual decisions about data inclusion except for the most conservative criteria. Descriptions of what the macro expects for input, and what it will output are below. The macro expects the following types of libraries, datafiles, and data: (a) variables identifying SAS library and filenames libIn = input SAS library name libOut = output SAS library name INDATA = filename of input SAS dataset in the input SAS library OUTDATA = filename for the output SAS dataset in the output SAS library (b) variables identifying four key pieces of information for calculating an IAT score BLOCNAME = variable name for block identifier in the indata file: alphanumeric indication of the four trial blocks ('B3', 'B4', 'B6', 'B7' are critical blocks corresponding to B3, B4, B6, and B7 from GNB, 2002). At present the macro requires that the variable passed here uses the names 'B3', 'B4', 'B6', 'B7' to refer to the B3-B7 blocks SESHID = variable name for unique subject identifier in the indata file TRIAL_LATENCY = variable name for latency of response for trial in the indata file TRIAL_ERROR = variable name for error coding: 0 if initial response was correct, 1 if initial response was incorrect in the indata file (c) three options for variations in the D algorithm VERROR = value: if '1' the algorithm will use error trial latencies, if '2' the algorithm will replace error trial latencies with blockmean+600 [blockmean is mean of correct responses only] (1 is current standard for designs that require error correction, 2 if error correction is not required) VEXTREME = value: if '1' the algorithm provides no treatment of extreme values, if '2' the algorithm will delete trials <400ms (2 is current standard) VSTD = value: if '1' the block standard deviation is performed including error trials (corrected or not), if '2' the block standard deviation is performed on correct responses only (1 is standard) Note: The D algorithm is not the definitive scoring method for the IAT. Improvements will be identified with continuing research by the academic community. This macro conservatively applies the best algorithms identified by Greenwald et al., 2003. Further enhancements to that algorithm will need to be validated and applied separately from this script. The script itself will evolve more slowly than innovations in scoring to ensure that the validity of its procedures is well documented prior to their standardization. The macro will output the following variables to a new file identified by the OUTDATA variable, if there is an existing file by the named used for OUTDATA, the macro will overwrite it: SESHID = unique subject identifier SUBEXCL = 0 for inclusion data, 1 for excluded data, 2 for incomplete data MB3 = mean of trial latencies for B3 MB4 = mean of trial latencies for B4 MB6 = mean of trial latencies for B6 MB7 = mean of trial latencies for B7 CS1 = standard deviation for B3 and B6 trials combined (correct trials only) CS2 = standard deviation for B4 and B7 trials combined (correct trials only) AS1 = standard deviation for B3 and B6 trials combined (all trials) AS2 = standard deviation for B4 and B7 trials combined (all trials) EB3 = percent errors of trials for B3 EB4 = percent errors of trials for B4 EB6 = percent errors of trials for B6 EB7 = percent errors of trials for B7 NB3 = number of trials used for mean calculation for B3 NB4 = number of trials used for mean calculation for B4 NB6 = number of trials used for mean calculation for B6 NB7 = number of trials used for mean calculation for B7 FB3 = percent fast responses of trials for B3 FB4 = percent fast responses of trials for B4 FB6 = percent fast responses of trials for B6 FB7 = percent fast responses of trials for B7 DIFF1 = MB6 - MB3 DIFF2 = MB7 - MB4 IAT1 = DIFF1/STD1 IAT2 = DIFF2/STD2 IAT = mean of IAT1 and IAT2 Note: The macro will also output M (mean), E (error), N (number) and F (fast) variables for all other blocks included in the initial SAS datafile. Those additional calculations will not affect the IAT score. If you do not want values calculated for other blocks, remove them from the input datafile before invoking the script. ; /* MACROS for NEW IMPLICIT ASSOCIATION TEST SCORING ALGORITHM (Greenwald, Nosek, & Banaji, 2003) FOR STANDARD USE, DO NOT CHANGE THE MACRO ITSELF. JUST RUN THE MACRO AND CALL IT BY FOLLOWING THE INSTRUCTIONS ABOVE. */ %macro iatCalc(libIn, libOut, indata, outdata, BLOCNAME, SESHID, TRIAL_LATENCY, TRIAL_ERROR, VERROR, VEXTREME, VSTD); %let divide= /; %let multiply= *; %let add = +; %let subtract = -; %iatAlgorithm(libIn=&libIn, libOut=&libOut, indata=&indata, outdata=&outdata, BLOCNAME=&BLOCNAME, SESHID=&SESHID, TRIAL_LATENCY=&TRIAL_LATENCY, TRIAL_ERROR=&TRIAL_ERROR, VERROR=&VERROR, VEXTREME=&VEXTREME, VSTD=&VSTD);run; %mend iatCalc; %macro iatAlgorithm(libIn, libOut, indata, outdata, BLOCNAME, SESHID, TRIAL_LATENCY, TRIAL_ERROR, VERROR, VEXTREME, VSTD); data IAT; set &libIn..&indata; %*PRELIMINARY STEPS FOR HANDLING WEBDATA FORMATS; keep &BLOCNAME &SESHID &TRIAL_LATENCY &TRIAL_ERROR; proc sort data=iat; by &SESHID &BLOCNAME; %*options nonotes; %*suppress all Notes to log; data IAT; set IAT; %*STEP 1 HAS BEEN REMOVED, NOW ALL DATA IS AT LEAST PARTIALLY ANALYZED; %*STEP 1: Include data from B3, B4, B6, B7; %*if &BLOCNAME in ('B3', 'B4', 'B6', 'B7') then ; %*else delete; %*STEP 2a: Eliminate trial latencies > 10,000ms; if &TRIAL_LATENCY > 10000 then delete; %*STEP 2b: Eliminate subjects for whom more than 10% of trials have latencies < 300ms; else if &TRIAL_LATENCY < 0 then delete; %*for miscoded data in datafile indicating negative response times; else if -1 < &TRIAL_LATENCY < 300 then FAST = 1; else FAST = 0; data FASTDATA; set IAT; keep &SESHID &BLOCNAME FAST; proc sort; by &SESHID &BLOCNAME; proc means data=IAT noprint; by &SESHID &BLOCNAME; var FAST; output out=means mean=MEAN; proc transpose data=means prefix=F name=name out=FASTMEAN; by &SESHID; id &BLOCNAME; data FASTMEAN; set FASTMEAN; where name='MEAN'; FASTM = mean(FB3, FB4, FB6, FB7); if FASTM > .10 then SUBEXCL = 1; else SUBEXCL = 0; %*SUBEXCL = 0 (include data), 1 (exclude data - too many fast responses), 2 (exclude data - missing data); %*The SUBEXCL variable needed to be reintroduced to the final dataset in STEP 12; %*STEP 3: Use all trials; %*in the conventional algorithm the first two trials of each block would be dropped here; %*STEP 4: No extreme value treatment delete trial with latencies <400ms; data IAT; set IAT; %*if &VEXTREME = 1 then do nothing here; if &VEXTREME = 2 then do; if &TRIAL_LATENCY < 400 then delete; end; proc sort data=IAT; by &SESHID &BLOCNAME; %*STEP 5: Compute mean of correct latencies for each block; data CORR; set IAT; %*if &VERROR is 1 then means and SDs will be calculated for the entire set of latencies; if &VERROR = 2 then do; if &TRIAL_ERROR NE 0 then delete; end; keep &SESHID &BLOCNAME &TRIAL_LATENCY &TRIAL_ERROR; proc means data=CORR noprint; by &SESHID &BLOCNAME; var &TRIAL_LATENCY; output out=means mean=MEAN; proc transpose data=means prefix=M name=name out=CORRMEAN; by &SESHID; id &BLOCNAME; data CORRMEAN; set CORRMEAN; where name='MEAN'; %*STEP 5x: Count number of trials used for each block mean calculation; proc means data=CORR noprint; by &SESHID &BLOCNAME; var &TRIAL_LATENCY; output out=means n=N; proc transpose data=means prefix=N name=name out=NMEAN; by &SESHID; id &BLOCNAME; data NMEAN; set NMEAN; where name='N';run; *outputs a file with the number of actual trials in each block; %*STEP 6a: Compute pooled SD for B3 & B6, and separately for B4 & B7 for correct trials only; data SD; set CORR; if &TRIAL_ERROR NE 0 then delete; *drop error trials; if &BLOCNAME in ('B3', 'B6') then TD = '1'; else if &BLOCNAME in ('B4', 'B7') then TD = '2'; else delete; drop &BLOCNAME; proc sort data=SD; by &SESHID TD; proc means data=SD noprint; by &SESHID TD; var &TRIAL_LATENCY; output out=means std=STD; proc transpose data=means prefix=CS name=name out=CORRSTD2; by &SESHID; id TD; data CORRSTD2; set CORRSTD2; where name='STD'; %*STEP 6b: Compute pooled SD for B3 & B6, and separately for B4 & B7 including error trials; data SD; merge IAT CORRMEAN; by &SESHID; if &TRIAL_ERROR < 0 then delete; else if &TRIAL_ERROR > 1 then delete; %*get rid of coding errors; else if &TRIAL_ERROR = 1 and &VERROR = 2 then do; if &BLOCNAME in ('B3') then &TRIAL_LATENCY = MB3 + 600; else if &BLOCNAME in ('B4') then &TRIAL_LATENCY = MB4 + 600; else if &BLOCNAME in ('B6') then &TRIAL_LATENCY = MB6 + 600; else if &BLOCNAME in ('B7') then &TRIAL_LATENCY = MB7 + 600; end; if &BLOCNAME in ('B3', 'B6') then TD = '1'; else if &BLOCNAME in ('B4', 'B7') then TD = '2'; else delete; drop &BLOCNAME; proc sort data=SD; by &SESHID TD; proc means data=SD noprint; by &SESHID TD; var &TRIAL_LATENCY; output out=means std=STD; proc transpose data=means prefix=AS name=name out=CORRSTD1; by &SESHID; id TD; data CORRSTD1; set CORRSTD1; where name='STD'; %*STEP 7: Replace error latencies with block mean + 600ms use latency from stimulus onset to correct response (when correct response is required); data ERR; set IAT; keep &SESHID &BLOCNAME &TRIAL_ERROR; if &TRIAL_ERROR < 0 then delete; else if &TRIAL_ERROR > 1 then delete; %*get rid of coding errors; proc means data=ERR noprint; by &SESHID &BLOCNAME; var &TRIAL_ERROR; output out=means mean=MEAN; proc transpose data=means prefix=E name=name out=ERRMEAN; by &SESHID; id &BLOCNAME; data ERRMEAN; set ERRMEAN; where name='MEAN'; %*STEP 7 continued: combining data; data COMBINE; merge CORRMEAN NMEAN CORRSTD1 CORRSTD2 ERRMEAN FASTMEAN; by &SESHID; %*combining datasets for calculating final means; if &VERROR=2 then do; array BLOCKMeans(*) MB: ; array BLOCKCstds(*) CS: ; array BLOCKAstds(*) AS: ; array BLOCKErrs(*) EB: ; do i=1 to dim(BLOCKMeans); %*for each of the four blocks replace error trials with mean + 600ms; BLOCKMeans{i} = (1-BLOCKErrs{i})*BLOCKMeans{i} + (BLOCKErrs{i})*(BLOCKMeans{i}+600); end; end; %*STEP 8: No transformation of latencies; %*in the conventional algorithm, raw latencies would be log transformed prior to the transposing in the current format; %*STEP 9: Average latencies for each of the four blocks; %*this step was already accomplished in the do loop above; %*STEP 10: Compute two differences B6-B3 and B7-B4 (does not account for pairing order); DIFF1 = MB6 - MB3; DIFF2 = MB7 - MB4; %*STEP 11: Divide each difference by associated pooled SD from STEP 6a or 6b; If &VSTD = 2 then do; IAT1 = DIFF1/CS1; IAT2 = DIFF2/CS2; end; else do; %*IF VSTD = 1 also set as default; IAT1 = DIFF1/AS1; IAT2 = DIFF2/AS2; end; %*STEP 12: Average quotients from STEP 11; IAT = mean(IAT1, IAT2); %*if there is missing data in critical blocks, mark data as excluded (SUBEXCL=2); %* old way of doing it do i=1 to dim(BLOCKMeans); %* if BLOCKMeans{i} = . then SUBEXCL = 2; %*end; if MB3=. | MB4=. | MB6=. | MB7=. then SUBEXCL=2; *if missing data from any of critical blocks then marked; data &libout..&outdata (drop=i name); set COMBINE; run; %mend iatAlgorithm; *END OF ALGORITHM;