/* / Program : Documented Cleaning Script (with code for importing RDE .txt datafiles / Version : 0.2 / Editor(s) : Nicole M Lindner / Date : April 15, 2009 / Contact : nml5d AT email DOT virginia DOT edu /====================================================================================================== / Purpose : Read in, clean, and transpose all standard datafiles resulting from / data collected at Project Implicit and downloaded through the / RDE (https://rde.projectimpilcit.net) / SubMacros : %IatCalc and %IatAlgorithm, available in sasMacro script / (see http://projectimplicit.net/nosek/papers/scoringalgorithm.sas.txt), / Either save each macro (file name same as the macro) in this directory: / C:\Program Files\SAS\SAS 9.1\core\sasmacro / Or use this line, changing the file location to wherever you have a file with the / IAT macros saved: %include 'C:\primary\dataweb\common\SPFD.sas'; / / Notes : This script: / * Reads in the typical datafiles for web studies, either for those from the Demo- and / International tasks (e.g., Demo.Age, Australia.Weight) or using the Research-site / participant pool or website infrastructure (e.g., NosekLab.othtran.dmp.budd.0001) / * Cleans the datafiles, transposes them, optionally obtains some useful / information (time spent on a task, sequence of implicit/explicit/other / tasks within the study, etc.) and merges them. / * For the IAT: cleans, reverse-codes, checks for problem participants/trials, / etc. in preparation for substantive analysis / * Merges all cleaned and transposed datafiles into a final cleaned dataset. / /====================================================================================================== / PARAMETERS: /-------name----------- -------------------------description------------------------------------------- /PREPARATION SECTIONS: These clean repeat observations and transposes the data to create one line per participant, in preparation for merging with other files): / ST-SessionTask / S-Session For Demo-site (& international) tasks w/o separate demographics file / S-ALTERNATE Session for Research-site data, also imports Demographics / D-Demographics For merging with S-ALTERNATE / E-Explicit / I-IAT /INFO ON TASK SECTIONS: / ST-Step3-TimeSpent / ST-Step A-Consenters / ST-Step B-Task Order / ST-Step C-Time Spent / ST-Step D-experimental condition assignment / /MD-Merging Data Cleaning Raw, Merged & Transposed Section /====================================================================================================== / AMENDMENT HISTORY: / init --date--- mod-id ----------------------description---------------------------------------------- / 0.1.1 Dec10,2008 Import RDE data: / Adds Carlee's script to import .txt datafiles downloaded from the / Project Implicit Virtual Lab's DATA Download feature (i.e., tab-separated / ".txt" files) from https://rde.projectimplicit.net / 0.2 Apr15,2009 Corrects import of datetime variables so SAS recognizes that they're datetime / /====================================================================================================== / This is public domain software. No guarantee as to suitability or accuracy is / given or implied. User uses this code entirely at their own risk. /====================================================================================================*/ FILENAME Raw 'C:\primary\dataweb\oth\raw\christians'; *Format expected by SAS for importing .txt files; libname Raw1 'C:\primary\dataweb\oth\raw\christians'; *identify library containing datafile; /***NOTE: I defined two libraries with the same location. This was done to optimize analysis of web datasets that are too large to keep the unzipped raw.datafiles after cleaning. In other words, raw1 would be given a different location (on an external harddrive). It stores the transposed/semi-cleaned datafiles, so that you could have them available without keeping the bulky, non-cleaned/transposed data in the raw library ***/ LIBNAME Clean "C:\primary\dataweb\oth\"; *for your final, "cleaned" data; *This is where all of the HTML pages of the output is kept; FILENAME Webout "C:\primary\dataweb\oth\raw\OUTPUT"; /*********** ************ ************ ************ ************* ST-SessionTask Preparation Section The SessionTask file is the map to all study files that participants saw: For every participant that was assigned to the study, there will be 1 line for each Task/page, with columns recording the following: a) each Task[page] that the participant viewed, b) the Order in which they saw the page, and c) the time spent on that page. Because data is recorded when the page is loaded, *regardless* of whether they make any responses or continue in the study, this should be used to determine their experimental condition, whether they provided consent, etc. This is especially important in research site studies, or others which contain experimental manipulations Note that to reduce the number of transformations, I'm dropping some variables (e.g., user_id and Session_Status, some datetime variables) and instead getting them from the Sessions datafile (one line per participant). ************ ************ ************ ************ ************/ /*** Step 0 - Bringing in the file ***/ DATA Temp(DROP=User_Agent User_ID Session_Created_BY Session_Last_Update_Date Session_Creation_Date Task_Creation_Date Session_Status); INFORMAT Task_Creation_Date Session_Date Session_Creation_Date Session_Last_Update_Date DATETIME20.; FORMAT Task_Creation_Date Session_Date Session_Creation_Date Session_Last_Update_Date DATETIME20.; INFILE Raw(sessionTasks.txt) DELIMITER='09'x LRECL=2000 FIRSTOBS = 2 ; INPUT Session_ID Task_Number Task_ID :$20. Task_URL :$128. User_Agent :$16. Study_URL :$48. Task_Status $ Task_Sequence $ Task_Creation_Date :ANYDTDTM21. User_ID Study_Name :$64. Session_Date :ANYDTDTM21. Session_Status:$4. Session_Creation_Date :ANYDTDTM21. Session_Created_By :$24. Session_Last_Update_Date :ANYDTDTM21.; RUN; ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.SessTask.01.Raw.TaskURL.htm"; proc contents data=temp;run; proc freq data=temp;tables session_date ;format session_date datetime20.;run; /*** Step 1 - Cleaning Repeat data; get rid of duplicate submissions to the dataset of the same data; ***/ PROC SORT DATA = Temp; BY Session_ID Task_URL; *sorting data; PROC FREQ DATA = Temp; TABLES Task_URL;RUN; DATA Temp; SET Temp; Repeat=0; IF Session_ID = LAG(Session_ID) AND Task_URL = LAG(Task_URL) THEN Repeat=1; ***IF prior row is same as current row, mark as a Repeat; ELSE ; RUN; TITLE2 'Repeat SessionTask Observations'; PROC FREQ; TABLES Repeat; RUN; /*** Drop duplicate lines of data ***/ DATA Temp; SET Temp; IF Repeat = 1 THEN DELETE; DROP Repeat; RUN; TITLE2 'Task URL, help for Cleaning'; /*** If Task_ID is not blank can substitute / add it in the TABLES statement ***/ PROC FREQ DATA = Temp; TABLES Task_URL ; RUN; ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.SessTask.02.Study.Order.Map.htm"; /***NOTE: All following steps for this dataset are entirely optional, and will vary based on your study design and the data you need ***/ /*** Step 2 - Cleaning Task_URL - Step A, etc. = pulling different information from the SessionTask file - Note that for most new studies, you can use the actual Task_ID variable, which will not have the extraneous HTML code for the location ***/ /* Task_URL Freq Percent Cum Freq /common/en-us/HTML/IATTask.jsp?i=/implicit/user/education/age/agea.xml& 10649 2.48 10649 /common/en-us/HTML/IATTask.jsp?i=/implicit/user/education/age/ageb.xml& 10598 2.47 21247 /common/en-us/HTML/IATTask.jsp?i=/implicit/user/education/age/agec.xml& 10488 2.44 31735 /common/en-us/HTML/IATTask.jsp?i=/implicit/user/education/age/aged.xml& 10639 2.48 42374 /common/en-us/HTML/instructions.jsp 51983 12.09 94357 /common/en-us/xml/demographics.xml 39996 9.31 134353 /flash/IATTask.jsp?i=/implicit/research/education/age/agea.xml 10469 2.44 144822 /flash/IATTask.jsp?i=/implicit/research/education/age/ageb.xml 10577 2.46 155399 /research/education/age/debriefing.jsp 16620 3.87 172019 /research/education/age/instructions.HTML 21717 5.05 193736 /research/education/age/intro.HTML 24757 5.76 218493 /research/education/age/questionnaire.HTML 21091 4.91 239584 /user/education/age/debriefing1.jsp 34097 7.93 273681 /user/education/age/debriefing2.jsp 24700 5.75 298381 /user/education/age/instructions.HTML 43395 10.10 341776 /user/education/age/intro.HTML 48121 11.20 389897 /user/education/age/questionnaire.xml 39932 9.29 429829 NOTE: TRANWRD(varname,"searchstring","transstring") translates (for varname) a given searchstring to the given translationstring (or blank) ***/ DATA Temp1; SET Temp; Task = TRANWRD(Task_URL,"/common/en-us/", ""); Task = TRANWRD(Task,"/education/age/",""); Task = TRANWRD(Task,TRIM("IATTask.jsp?i=/implicit"),""); Task = TRANWRD(Task,TRIM("/research"),""); Task = TRANWRD(Task,TRIM("/user"),""); Task = TRANWRD(Task,TRIM("/flash/"),""); Task = TRANWRD(Task,TRIM("html/"),""); Task = TRANWRD(Task,"arabb.xml","IAT"); Task = TRANWRD(LEFT(Task),".",""); Task = TRANWRD(LEFT(Task),"/",""); Task = TRANWRD(LEFT(Task),"xml",""); Task = TRANWRD(LEFT(Task),"jsp",""); Task = TRANWRD(LEFT(Task),"&",""); Task = TRANWRD(Task, "html",""); Task = TRIM(LEFT(Task)); DROP Task_URL; RUN; PROC FREQ DATA=Temp1;TABLES Task;RUN; /*** Sorting data in proper Order; ***/ PROC SORT DATA = Temp1; BY Session_ID; RUN; PROC FREQ DATA=Temp1;TABLES Task Task_ID;RUN; PROC MEANS N MAXDEC=0 DATA=Temp1 NONOBS;VAR Session_ID;CLASS Task_Number Task ;FORMAT Task $16.;RUN; /*** Step O (Optional). Identify the task number for the IAT AND Explicit, i.e., was IAT before (1) or after (2) the questionnaire. This step is unnecessary for data that have the sessionTasks datafile (but included here because Legacy data does not have session_tasks) and for correspondence to the PCIAS analysis ***/ DATA Raw1.ICond(KEEP=ITaskNum Session_ID TaskTemp); SET Temp1(WHERE=(Task IN ('agea', 'ageb', 'agec', 'aged') )); FORMAT TaskTemp $15.;FORMAT ITaskNum 8.; ITaskNum=Task_Number;TaskTemp = Task; RUN; DATA Raw1.ECond(KEEP=ETaskNum Session_ID);SET Temp1(WHERE=(Task = "questionnaire")); FORMAT ETaskNum 8.;ETaskNum=Task_Number; RUN; PROC SORT DATA=Raw1.ECond;BY Session_ID; PROC SORT DATA=Raw1.ICond;BY Session_ID;run; /*********** ************ ************ ************ ************* ST-Step A - Dropping data for non-consenters; NOTE: Primarily for Research site data (where you will have demographics data for all who were assigned to the study, even IF they didn't provide consent (I think?). This makes an important assumption--that the consent form is Task #0, and that if they go past that, that they have provided consent In newer data, consent will be task_number #1, in which case the WHERE statement and RENAME below should be WHERE=(Task_Number=2) & the DATA statement below reference the _1 variable ************ ************ ************ ************ ************/ PROC TRANSPOSE DATA=Temp1(WHERE=(Task_Number=1)) NAME=NAME OUT=GaveConsent(DROP=NAME _LABEL_ RENAME=(_1=ConsentGiven)); BY Session_ID; VAR Task; ID Task_Number;RUN; PROC FREQ DATA=GaveConsent;TABLES ConsentGiven;RUN; PROC CONTENTS DATA=GaveConsent;RUN; /*** Depending on number of conditions, this might have many different Tasks, but I just want to mark whether they were assigned to a page after consent ***/ DATA Clean.GaveConsent(DROP=consentGiven); SET GaveConsent; FORMAT Consenter 8.;IF consentGiven NE '' THEN Consenter=1;ELSE Consenter = .; RUN; PROC FREQ DATA=Clean.GaveConsent;TABLES Consenter / MISSING;RUN; /*** You would then merge this with the cleaned Demographics datafile, and only keep observations where consenter = 1, so that you don't have blank sessions for all participants who were assigned to the study, but didn't go beyond the Consent page; ***/ /*********** ************ ************ ************ ************* ST-Step B - Determining Task ("pages" in the study_ID) Order ; You can do something similar IF Task_ID is not empty To correspond to PCIAS analyses, this also saves the IAT & Explicit task_numbers to ITaskNum & ETaskNum, and saves the IAT variation (a,b,c,d) to TaskTemp ************ ************ ************ ************ ************/ DATA tempSessTask;SET Temp1; FORMAT ECond TaskSimple $8. TaskTemp $15.; FORMAT ITaskNum ETaskNum 8.; IF Task_Number = 0 THEN DO; IF Task = "Instructions" THEN TaskSimple = "consent"; ELSE IF Task = "intro" THEN TaskSimple = "consent"; END; ELSE IF TaskSimple = "" AND Task = "intro" THEN TaskSimple = "Intro"; ELSE IF Task = "instructions" THEN TaskSimple = "StudyIns"; ELSE IF Task IN ('agea', 'ageb', 'agec', 'aged') THEN DO; TaskSimple = "IAT"; END; ELSE IF Task IN ('debriefing', 'debriefing1') THEN TaskSimple = "debrief"; ELSE IF Task ='debriefing2' THEN TaskSimple = "debrief2"; ELSE IF Task = "questionnaire" THEN DO;TaskSimple = "Explicit";ITaskNum=Task_Number;END; ELSE IF Task = "demographics" THEN TaskSimple = "demogrphs"; RUN; DATA tempSessTask;SET Temp1; FORMAT TaskSimple $8.; IF Task_Number = 0 THEN DO; IF Task = "Instructions" THEN TaskSimple = "consent"; ELSE IF Task = "intro" THEN TaskSimple = "consent"; END; ELSE IF TaskSimple = "" AND Task = "intro" THEN TaskSimple = "Intro"; ELSE IF Task = "instructions" THEN TaskSimple = "StudyIns"; ELSE IF Task IN ('agea', 'ageb', 'agec', 'aged') THEN TaskSimple = "IAT"; ELSE IF Task IN ('debriefing', 'debriefing1') THEN TaskSimple = "debrief"; ELSE IF Task ='debriefing2' THEN TaskSimple = "debrief2"; ELSE IF Task = "questionnaire" THEN TaskSimple = "Explicit"; ELSE IF Task = "demographics" THEN TaskSimple = "demogrphs"; RUN; PROC FREQ DATA=TempSessTask;TABLES Task;WHERE TaskSimple = "";RUN; /*** Only run this line out once you are sure you've fully recoded Task into TaskSimple ***/ DATA TempSessTask(RENAME=(TaskSimple=Task));SET TempSessTask(DROP=Task);RUN; DATA Order; SET TempSessTask(KEEP=Session_ID Task Task_Number);RUN; PROC SORT DATA=Order;BY Session_ID Task;RUN; /*** NOTE: You would probably need to do some renaming (as is done here when I create the TempSessTask data). Otherwise, there will be a separate Order variable for each of the counterbalanced IATs, and all other file variations left in the Task variable, which would lead to some extraneous variables ***/ PROC TRANSPOSE DATA=Order PREFIX=ORD NAME=NAME OUT=Clean.Order(DROP=NAME _LABEL_); BY Session_ID; VAR Task_Number; ID Task; RUN; PROC CONTENTS DATA=Clean.Order; RUN; /*** Enter all variables (except Session_ID!) here This gives a basic rendering of the order of important tasks. Useful if you need to calculate new variable (with IF-THEN clauses) indicating whether explicit came before/after IAT ***/ PROC FREQ DATA = Clean.Order; TABLES ORDconsent ORDIntro ORDStudyIns ORDdemogrph ORDExplicit ORDIAT ORDdebrief ORDdebrief2; RUN; /*********** ************ ************ ************ ************* ST-Step C - Calculating time spent on the entire study (or modify to assess time on specific Tasks); - Identifies amount of time for each Task. This requires more recoding, but works better for long-running Demo-site tasks, where the design often changes (so last Task_number changes) - IF you have a simple design and just want time spent on the entire study, just TRANSPOSE Task_number instead of Task IF you do something, you can add the following 2 lines to TimeOnTasks datastep: * FORMAT TimeIAT TimeIatinstr TimeSurvey mmss.; * TimeSurvey = (Start2-Start1); TimeIatinstr = (Start3-Start2); TimeIAT = (Start4-Start3); ************ ************ ************ ************ ************/ TITLE2 'Time for Each Task'; /*** Only run this line out once you are sure you've fully recoded Task into TaskSimple ***/ DATA TempTime(RENAME=(TaskSimple=Task)); SET TempSessTask(KEEP=Session_ID TaskSimple Task_Number Task_Creation_Date); RUN; /* STARTDVCand Num 8 DATETIME20. 6 STARTDVFact Num 8 DATETIME20. 5 STARTInsManip Num 8 DATETIME20. 3 STARTbegin Num 8 DATETIME20. 4 STARTend Num 8 DATETIME20. 2 STARTmanip Num 8 DATETIME20. */ /*** Identify Task start time ***/ PROC TRANSPOSE DATA=TempTime PREFIX=Start NAME=NAME OUT=TimeOnTasks(DROP=NAME _LABEL_); BY Session_ID; VAR Task_Creation_date; ID Task; RUN; PROC TRANSPOSE DATA=TempTime PREFIX=Num NAME=NAME OUT=TimeTNum(DROP=NAME _LABEL_); BY Session_ID; VAR Task_Number; ID Task; RUN; PROC CONTENTS DATA=TimeOnTasks; RUN; PROC CONTENTS DATA=TimeTNum; RUN; PROC SORT DATA=TimeTNum;BY Session_Id; PROC SORT DATA=TimeOnTasks;BY Session_Id; /*** NOTE: SAS's default Times/date formats are crazy: They're the number of seconds since January 1st, 1960 So if you do a difference score, you'll get the number of seconds, but you'll want to apply a format to make it easier to read. Check SAS help for mmss. format or time. format for more details about them. Here are a few, with the raw value first, then the displayed value that each time format will result in: _Format_Raw #_ _Displays_ HHMM. 53132 14:46 HOUR. 53132 15 MMSS. 53132 885 TIME. 53132 14:45:32 TOD. 53132 14:45:32 ***/ DATA Raw1.TimeOnTasks; Merge TimeOnTasks TimeTNum(KEEP = Session_ID NumManCntrl NumManip); BY Session_ID; /*** By specifying the format for the new Time variables before I set them equal to something SAS keeps them in the format I'd like ***/ FORMAT timeTotal TimeDVCand TimeManip TimeDVFact TIME.; /* Or MMSS.*/ TimeTotal = STARTend - STARTbegin; TimeDVCand = StartDVFact - StartDVCand; IF NumManCntrl NE . THEN DO; TimeManip = StartEndExp - StartManCntrl; TimeDVFact = StartManCntrl - StartDVFact; END; IF Nummanip NE . THEN DO; TimeManip = StartInsDV - StartManip; TimeDVFact = StartEndExp - StartDVFact; END; RUN; PROC MEANS DATA=Raw1.TimeOnTasks;VAR TimeTotal TimeDVCand TimeManip;RUN; DATA Clean.TimeOnTasks(KEEP= Session_ID TimeDVCand TimeDVFact TimeManip TimeTotal); SET Raw1.TimeOnTasks;RUN; PROC MEANS DATA=Clean.TimeOnTasks;VAR TimeTotal TimeDVCand TimeManip TimeDVFact; FORMAT TimeTotal TimeDVCand TimeManip TimeDVFact TIME.;RUN; PROC MEANS DATA=Clean.TimeOnTasks;VAR TimeTotal TimeDVCand TimeManip TimeDVFact; WHERE StartBegin NE . & StartEnd NE .;RUN; PROC UNIVARIATE DATA=Clean.TimeOnTasks NOPRINT;VAR TimeTotal TimeDVCand TimeManip TimeDVFact ; HISTOGRAM TimeTotal TimeDVCand TimeManip TimeDVFact ; FORMAT TimeTotal TimeDVCand TimeManip TimeDVFact TIME10.;RUN; PROC UNIVARIATE DATA=Clean.TimeOnTasks NOPRINT;VAR TimeDVCand ; HISTOGRAM TimeDVCand ;FORMAT TimeDVCand TIME10.; WHERE TimeDVCand < 360;RUN; PROC UNIVARIATE DATA=Clean.TimeOnTasks;VAR TimeDVCand; HISTOGRAM TimeDVCand; FORMAT TimeDVCand TIME10.;RUN; PROC UNIVARIATE DATA=Clean.TimeOnTasks NOPRINT;VAR TimeDVCand TimeManip TimeDVFact ; HISTOGRAM TimeDVCand TimeManip TimeDVFact; FORMAT TimeDVCand TimeManip TimeDVFact MMSS.;RUN; PROC UNIVARIATE DATA=Clean.TimeOnTasks NOPRINT;VAR TimeDVCand; HISTOGRAM TimeDVCand / VSCALE=COUNT; FORMAT TimeDVCand 8. ;RUN; /*********** ************ ************ ************ ************* ST-Step D - Identifying experimental assignment - This will require you to change the coding to fit your design, but this should give you an idea of what you need to do. - New-model studies, WHERE Task_ID contains values, may be easier to decode - Use this, rather recoding based on the Explicit datafile, because session_Task records experimental assignment for all observations, not just those in which participants reported at least one response! NOTE: For the research-site study from which this is taken, Task_number 1 always contained my experimental manipulation. %&&&&&&&&&&&&& CHANGE THIS &&&&&&&&&&&&% NOTE: You MUST change this to fit your study design!!! This does not work with Demo (e.g., age data because experimental condition info is not given in Task_number=1, but does work with the research-site study from which it was adapted ************ ************ ************ ************ ************/ PROC SORT DATA=Temp;BY Session_ID;RUN; proc freq data=temp;tables task_URL;where task_number=1;run; PROC TRANSPOSE DATA=Temp(WHERE=(Task_Number=1)) NAME=NAME OUT=expassign(DROP=NAME _LABEL_ RENAME=(_1=SurveyName)); BY Session_ID; VAR Task_URL; ID Task_Number;RUN; DATA Clean.ExpAssign; SET ExpAssign; /*** NOTE: This was written for a study WHERE the Surveyname followed a SET FORMAT, WHERE all survey file names were of equal length, looking something like this: surveyblack2amerb Here are some helpful functions to recode this: * SUBSTR(varname,NPosition,NLength) pulls a substring from VARNAME, beginning at NPOSITION, and continues for NLENGTH characters (assumed to be 1, IF not otherwise specified) * PUT(varname, FORMAT.) takes the values from VARNAME and outputs them using the FORMAT ($N. = character; N. = numeric) * KINDEX(varname, "searchstring"): searches the CONTENTS of _varname_ for the provided "searchstring". Returns 1 IF it is there, 0 IF not. ***/ FORMAT SName QOrder 8. ; FORMAT SEthnic SIdeol 7.1; *** codes for the first name of target counterbalancing; SName = PUT(SUBSTR(SurveyName, 12,1),8.); /*** Or substitute a long IF-THEN-ELSE statement, beginning: IF SUBSTR(Surveyname, 12,1)= "1" THEN Sname = 1; ***/ *** determine the target's apparent ethnicity; IF KINDEX(SurveyName, "black") THEN Sethnic =.5; ELSE IF kindex(SurveyName, "white") THEN Sethnic = -.5; ELSE Sethnic =.; *** codes for target's speech ideology; IF KINDEX(SurveyName, "amer") THEN SIdeol = -.5; ELSE IF KINDEX(SurveyName, "arab") THEN SIdeol = .5; ELSE SIdeol = .; *** codes for order of agree/disagree statements; IF SUBSTR(SurveyName,17) = "a" THEN QOrder = 0; ELSE IF SUBSTR(SurveyName,17) = "b" THEN QOrder = 1; ELSE QOrder = .; RUN; PROC FREQ DATA=Clean.ExpAssign;TABLES Sname QOrder SEthnic SIdeol;RUN; /*********** ************ ************ ************ ************* S-Session Preparation Section (No valid demographics data) For Demo- or International-site tasks (with no signup-demographics). It contains one line for each participant. IF YOU HAVE VALID DEMOGRAPHICS DATA (i.e., your study used the Research-site participant pool), use the alternate Session Preparation section below. Also relevant for determining whether they saw all pages of the study through to the debriefing or other last page, in which case Session_Status would be = "C" for Complete (note: this doesn't assess whether they chose to *respond* to all items) ************ ************ ************ ************ ************/ /*** Step 0 - Bringing in the file; ***/ DATA Clean.Sessions(DROP=Referrer Previous_Session_Schema Previous_Session_ID Referrer Study_Url User_Agent); INFORMAT Session_Date Creation_Date Last_Update_Date DATETIME20.; FORMAT Session_Date Creation_Date Last_Update_Date DATETIME20.; INFILE Raw(sessions.txt) DELIMITER='09'x LRECL=2000 FIRSTOBS = 2 ; INPUT Session_ID User_ID Study_Name:$48. Session_Date:ANYDTDTM21. Session_Status:$4. Creation_Date:ANYDTDTM21. Last_Update_Date:ANYDTDTM21. Previous_Session_ID Previous_Session_Schema $ Referrer $ Study_URL :$48. User_Agent :$8.; if Session_Status = "null" then Session_Status = "M"; RUN; PROC SORT DATA=Clean.sessions; BY Session_ID;RUN; /*** Session data is now cleaned and sorted for merging with other data; Note: Transposing is not necessary because there is only 1 line per participant ***/ /*********** ************ ************ ************ ************* S-ALTERNATE Session Preparation Section: ***for Research-site data with Demographics only*** Primarily used with research-site data, to match up user_id (in demographics data) to the session_id (specific to this study, and is used in all other files). This section doesn't match perfectly with the rest of the analysis script because this is from a different dataset (a research-site study). They're similar enough that you shouldn't have problems merging them ************ ************ ************ ************ ************/ /*** Step 0 - Bringing in the Sessions file ***/ /*** NOTE: The alteration is that this doesn't drop User_ID, so that it can be match-merged with the transposed demographics datafile below ***/ DATA Raw1.Sessions(DROP=Referrer Previous_Session_Schema Previous_Session_ID Referrer Study_Url User_Agent); INFORMAT Session_Date Creation_Date Last_Update_Date DATETIME20.; FORMAT Session_Date Creation_Date Last_Update_Date DATETIME20.; INFILE Raw(sessions.txt) DELIMITER='09'x LRECL=2000 FIRSTOBS = 2 ; INPUT Session_ID User_ID Study_Name:$48. Session_Date:ANYDTDTM21. Session_Status:$4. Creation_Date:ANYDTDTM21. Last_Update_Date:ANYDTDTM21. Previous_Session_ID Previous_Session_Schema $ Referrer $ Study_URL :$48. User_Agent :$8.; if Session_Status = "null" then Session_Status = "M"; RUN; /*** Session data is now cleaned and sorted for merging with the demographics; NOTE: Transposing is not necessary because there is only 1 line per participant ***/ /*********** ************ ************ ************ ************* D-Demographics Preparation Section: This file will only be valid (unique values for User_ID, not 0s or -1s) for studies with samples from the research-site participant pool. THAT MEANS: * Not relevant for Demo/International-site studies * Unlikely to be relevant for contract studies ************ ************ ************ ************ ************/ /*** Step 0 - Bringing in the demographics file ***/ ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.SessDemo.01.Cleaning.Demographics.htm"; DATA Temp; SET Raw.DemoUC Raw.DemoucOld;run; DATA Temp; INFILE Raw(demographics.txt) DELIMITER='09'x LRECL=2000 FIRSTOBS = 2 ; INPUT Characteristic:$16. Value:$16. User_ID study_name:$48.; if Value = "null" then Value = "."; RUN; PROC SORT DATA = Temp; BY User_ID Characteristic_S; /*** Step 1 - Cleaning Repeat data; Get rid of duplicate submissions to the dataset of the same data; ***/ DATA Temp; SET Temp; Repeat=0; IF User_ID = LAG(User_ID) AND Characteristic = LAG(Characteristic) THEN Repeat=1; ***If prior row is same as current row, mark as a Repeat; ELSE ; RUN; TITLE2 'Repeat Demographics Observations'; PROC FREQ; TABLES Repeat; RUN; /*** Drop duplicate lines of data ***/ DATA Temp; SET Temp; IF Repeat = 1 THEN DELETE; DROP Repeat; RUN; /*** Step 2 - Transposing Demographics data ***/ PROC TRANSPOSE DATA=Temp NAME=NAME PREFIX=Old OUT=Temp1(DROP= _LABEL_ NAME); BY User_ID; VAR Value;ID Characteristic; RUN; /*** Step 3 - Merging Demographics & Sessions data; Merging, based on User_ID (Demographics has only User_ID, other "real" files use only Session_ID, so we need Sessions data to identify participants session_ID. Drop the User_ID after this. Also necessary to bring in Session_Status and especially Session_Date, so that participants' age on the date of the session can be computed ***/ PROC SORT DATA=Raw1.Sessions; BY User_ID;RUN; PROC SORT DATA=Temp1; BY User_ID;RUN; DATA Raw1.TempDemos;MERGE Raw1.Sessions Temp1;BY User_ID;RUN; /*** Step 4 - Dropping all non-consenters Retaining demographics data only for consenting participants, so that the number of valid sessions isn't artificially inflated by the number of participants assigned to the study who chose not to participate. NOTE: This requires you to follow the step in the SessionTask preparation section to identify the consenters (i.e., Clean.GaveConsent) ***/ PROC SORT DATA=Raw1.TempDemos; BY Session_ID;RUN; PROC SORT DATA=Clean.GaveConsent; BY Session_ID;RUN; DATA Temp(WHERE=(Consenter = 1));MERGE Raw1.TempDemos Clean.GaveConsent;BY Session_ID; RUN; /*** Step 5 - Reducing the size of these variables, to save room. In SAS, this requires that a new variable be created, with a shorter length (or numeric instead of character). Note that in the PROC TRANSPOSE above, I added the prefix "Old" to all transposed variables so that I don't have to rename the shortened variables here, or choose new variable names for the shortened ones. ***/ /***NOTE: Cleaning demographics data only for consenting participants ***/ DATA Raw1.Demos(DROP=User_ID _LABEL_ NAME OldBirthMonth OldBirthYear OldCitizenship OldClass OldDayofBirth OldEducation OldEngFluency OldEthnicity OldIncome OldMajor OldOccupation OldPoliticalID OldRaceOMB OldEthnicityOMB OldGenoccupation OldReligionID OldResidence OldSex OldZipcode RENAME = (OldRelDenom=RelDenom OldRelFamily=RelFamily OldReligion=Religion )); SET Temp; FORMAT Birth MMDDYYN.; /***NOTE on date format: I am converting the datetime-format variables into date variables, so that age on the session_date can be computed. Older data will not include the DayofBirth variable. For those people their ages are calculated, with the assumption that it falls on the last of their birthmonth ***/ IF OldBirthYear NOT IN ('.','') THEN DO; IF OldDayOfBirth NOT IN ('','.') THEN DO; **** Participant error/spoofing. Leap years occur on 4 yr intervals (like the old Olympics); IF OldBirthMonth='2' & OldDayOfBirth='29' & MOD(PUT(OldBirthYear,8.),4) NE 0 THEN OldDayOfBirth='28'; IF OldBirthMonth NOT IN ('','.') THEN Birth = MDY( PUT(OldBirthMonth,2.), PUT(OldDayOfBirth,2.), PUT(OldBirthYear,4.) ); ELSE Birth = MDY( 12,PUT(OldDayOfBirth,2.),PUT(OldBirthYear,4.) ); END; ELSE IF OldBirthMonth = '2' THEN Birth = MDY( PUT(OldBirthMonth,2.),28,PUT(OldBirthYear,4.) ); ELSE IF OldBirthMonth IN ('1', '3', '5', '7', '8', '10', '12') THEN Birth = MDY( PUT(OldBirthMonth,2.),31,PUT(OldBirthYear,4.) ); ELSE IF OldBirthMonth IN ('4', '6', '9', '11') THEN Birth = MDY( PUT(OldBirthMonth,2.),30,PUT(OldBirthYear,4.) ); ELSE IF OldBirthMonth IN ('','.') & OldBirthYear NOT IN ('','.') THEN Birth = MDY( 12,31,PUT(OldBirthYear,4.) ); END; ELSE; FORMAT Edu Class EngFluency Ethnic EthnicityOMB RaceOMB Income PoliticalID RelID Major 8.; /***NOTE: Numeric responses: As long as you define the new variable's format type before setting NewVar= OldVar, it will use the FORMAT type you specified. Otherwise, you need to use PUT statements to force SAS to recognize the numeric formatting ***/ Edu = OldEducation; Class = OldClass; EngFluency = OldEngFluency; Ethnic = OldEthnicity; EthnicityOMB = OldEthnicityOMB; RaceOMB = OldRaceOMB; /***NOTE: The way we recorded race changed abruptly around September 27, 2006, because NIH indicated that we should request it using the categories used in the census. See the wiki section: https://www.projectimplicit.net/imptrac/wiki/EthnicityDetails for details. If you have some participants who reported ethnicity and others who reported EthnicityOMB/RaceOMB then you'll need to recode it (see create_common_dir._format-label_PI_vars.sas on the wiki***/ Income = OldIncome; PoliticalID = OldPoliticalID; RelID = OldReligionID; Major = OldMajor; FORMAT Sex $1.; Sex = OldSex; FORMAT CountryCit CountryRes $2.; CountryCit = OldCitizenship; CountryRes = OldResidence; *** Drops out non-numeric values from the zipcode. Don't use if you want international "zip"codes ; FORMAT Zipcode z8.; Zipcode = INPUT(COMPRESS(OldZipcode,'0123456789','k'), 8.); FORMAT Occupation $9.; Occupation = TRIM(OldOccupation); FORMAT GenOccupation $9.; Genoccupation = TRIM(OldGenOccupation); RUN; DATA Clean.Demos;set Raw1.Demos; /***NOTE: INTCK needs ('INTERVAL',BeginningDate,EndingDate). It counts the number of full SAS-provided intervals (that is, months) between the Beginning and Ending dates of the Session_Date and the calculated birth (counting in years doesn't work, according to the invisible people on the Internet). Because we want the number of years old, I divide the age in months by 12. FLOOR truncates the age, rather than rounding (if I'm 29 and 11 months old, I wouldn't want it to round UP, I'd want it to recognize that I'm 29 until 11:59 on my birthday eve.) ***/ FORMAT Age 8.0; Age=FLOOR((INTCK('MONTH',Birth,DATEPART(Session_Date))) / 12);RUN; ODS HTML CLOSE; /*** Demographics & Session data are now cleaned and sorted for merging with other data ***/ /*********** ************ ************ ************ ************* E-Explicit Preparation Section ************ ************ ************ ************ ************/ /*** Step 0 - Bringing in the explicit file ***/ DATA Explicit; INFILE Raw(explicit.txt) DELIMITER='09'x LRECL=2000 FIRSTOBS = 2 MISSOVER; INPUT Task_Number Question_Number Questionnaire_Name:$9. Question_Name:$14. Question_Response:$18. Attempt Study_Name:$48. Session_ID; RUN; /*** Step 1 - Cleaning Repeat data; Get rid of duplicate submissions to the dataset of the same data; ***/ ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.Exp.01.Repeat.htm"; PROC CONTENTS; RUN; PROC MEANS N MAXDEC=0;RUN; PROC SORT DATA=Temp; BY Session_ID Questionnaire_Name Question_Name;RUN; *** Sorting data; DATA Temp; SET Temp; Repeat=0; IF Session_ID = LAG(Session_ID) AND Questionnaire_Name = LAG(Questionnaire_Name) AND Question_Name = LAG(Question_Name) THEN Repeat=1; ***If prior row is same as current row, mark as a repeat, to prevent problems in transpose; RUN; TITLE2 "Repeat Explicit observations"; PROC FREQ; TABLES Repeat;RUN; *** Drop all Repeat data; DATA Temp; SET Temp;IF Repeat = 1 THEN DELETE; DROP Repeat; RUN; ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.Exp.02.Transposing.Explicit.htm"; /*** Step 2 - Transpose Explicit data ***/ /*** Step 2a - Transpose Explicit: convert numeric variables to numeric format - This separates out the numeric variables (you should run a PROC FREQ on Question_Name variable, to identify , so that I can forcibly convert the question_response to numeric *before* transposition, necessary in SAS (otherwise, I would have to individually convert every numeric variable, as is done for the character variables). ***/ DATA ExplicitNum;SET Temp(WHERE=(Question_Name IN ( 'age', 'aap1', 'aap7', 'atoma','att', 'choosetobe', 'ctoya', 'd','edu','ems1', 'ems4', 'ethnic', 'ethnicityomb', 'feel', 'hopetolive','ims3', 'ims4', 'matoo','othersthink', 'politicalid','raceomb', 'religionid', 'Told', 'Tyoung', 'yatoa'))); IF Question_Name="age" & Question_Response_S = "90+" THEN Question_Response_S = "91"; ELSE; RUN; DATA ExplicitNum1(DROP=Question_Response_S);SET ExplicitNum; FORMAT Question_Response 8.; Question_Response = PUT(Question_Response_S,8.); RUN; PROC CONTENTS DATA=ExplicitNum1;RUN; /*** Transpose data to get questions and answers ***/ PROC TRANSPOSE DATA=ExplicitNum1 NAME=NAME OUT=ExplicitNum(DROP=_LABEL_ NAME); *variable name = Question_name, data = Question_Response; VAR Question_Response; BY Session_ID; ID Question_Name;RUN; *one line per session; /*** Step 2b - Transpose Explicit character variables & shorten ***/ DATA ExplicitChar(DROP=Question_Response_S); SET Temp; WHERE Question_Name IN ('countrycit', 'countryres', 'feedback', 'major', 'num', 'occupation', 'postcodenow', 'postcodelong', 'religion', 'sex'); FORMAT Question_Response $8.; Question_Response = PUT(Question_Response_S,8.); RUN; PROC TRANSPOSE DATA=ExplicitChar PREFIX=Old NAME=NAME OUT=ExplicitChar1(DROP=_LABEL_ NAME); *** variable name = Question_name, data = Question_Response; VAR Question_Response; BY Session_ID; ID Question_Name;RUN; ***one line per session; PROC CONTENTS DATA=ExplicitChar1;RUN; DATA ExplicitChar(RENAME=(OldPostCodeNow=PostCodeNow OldPostCodeLong=PostCodeLong Oldoccupation=Occupation OldReligion=Religion) DROP = Oldsex OldCountrycit OldCountryRes oldnum OldFeedback); SET ExplicitChar1; * reduce the space required to store character vars; FORMAT Gender $2.; Gender=Oldsex; FORMAT CountryCit CountryRes $2.; CountryCit=OldCountryCit; CountryRes=OldCountryRes; FORMAT IatExpr $3.; IF OldNum NE "" THEN IatExpr = OldNum; ELSE IatExpr = ''; RUN; PROC CONTENTS DATA=ExplicitChar;RUN; PROC CONTENTS DATA=ExplicitNum;RUN; /*PROC CONTENTS data=Explicit;RUN; */ DATA Clean.Explicit; MERGE ExplicitNum ExplicitChar; BY Session_ID; RUN; PROC CONTENTS DATA=Clean.Explicit;RUN; /*** Explicit data is now cleaned and sorted for merging with other data; ***/ /*********** ************ ************ ************ ************* I-IAT Preparation Section ************ ************ ************ ************ ************/ /*** Step 0 - Bringing in the file ***/ DATA Temp; INFILE Raw(iat.txt) DELIMITER='09'x LRECL=2000 FIRSTOBS = 2 ; INPUT Block_Number /* Block in the IAT, usually from 0 - 6; for the SPF, 0-3 */ Block_Name $ /* Usually has format "BLOCK9 */ Block_Trial_count /* Total N in this block, usually 40 or 20 */ Block_Pairing_Definition $ /* Short/nonexistent for SPF, Category labels for IAT (if yours has longer labels, you may need to adjust length ) */ Study_Name:$48. /* Same as in the other files. May need to adjust length */ Task_Number /* I think this should match with # in SessionTasks?, as its sequence in entire study */ Task_Name:$16. /* usually IAT version (e.g., agea, ageb agec, aged) */ Trial_Number Trial_Name:$20. /* Stimulus name (e.g., picture name or word */ Trial_Response:$26. Trial_Latency /* In milliseconds */ Trial_Error /* See macro--logical value 0 or 1, comparing trial name & response */ Session_ID; RUN; /*** Sometimes a datapoint is inadvertently sent to the database twice for the same Task; This section will sort the data, identify duplicates, and remove them ***/ /*** Step 1 - Cleaning Repeat data ***/ ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.IAT.01.Repeat.htm"; TITLE1 "Demo.Age: Post-PVIB, pre-Age2.0"; PROC SORT DATA=Temp; BY Session_ID Task_NAME Block_NAME Trial_Number; DATA Temp; SET Temp; *** this assumes that the same Task_name is not administered twice in a session; Repeat=0; IF Session_ID = LAG(Session_ID) AND Task_NAME = LAG(Task_NAME) AND Block_NAME = LAG(Block_NAME) AND Trial_Number = LAG(Trial_Number) THEN Repeat=1; TITLE2 "Repeat IAT observations"; PROC FREQ; TABLES Repeat;RUN; DATA Temp(DROP=Repeat); SET Temp; IF Repeat = 1 THEN DELETE; ***dropping Repeat data; ***Formats Trial_Error into (numeric, not character) format expected/required by IAT scoring macro; Trial_Error = MEAN(TrlError); RUN; TITLE2 "IAT CONTENTS"; PROC CONTENTS DATA=Temp; RUN; /*** Step 2 - initial data cleaning checks - what versions of the IAT were used in this dataset? - Use the information here to take care of any stray sessions (from other studies) and later to reverse-code the IAT ***/ ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.IAT.02.Design.Details.htm"; TITLE2 "Number of Blocks"; PROC MEANS N MEAN MIN MAX; CLASS Block_NAME; VAR Session_ID; WHERE Trial_Number = 1;RUN; TITLE2 "Trials per Block"; PROC MEANS N MEAN MIN MAX; CLASS Block_NAME Block_Trial_Count; VAR Session_ID; WHERE Trial_Number = 1;RUN; /* Standard IAT will look like this: Block_TRIAL_ Block_name COUNT BLOCK0 20 BLOCK1 20 BLOCK2 20 BLOCK3 40 BLOCK4 40 BLOCK5 20 BLOCK6 40 */ TITLE2 "Block Pairings"; PROC MEANS N MIN MAX MAXDEC=0; CLASS Block_Pairing_Definition; VAR Session_ID; WHERE Trial_Number = 1;RUN; /*** DATA Temp; SET Temp; %&&&&&&&&&&&&& CHANGE THIS and uncomment if you need to use it &&&&&&&&&&&&% * NOTE: CHANGE FOR YOUR DATA: A stray African American/Harmless Objects,European American/Weapons session; IF Session_ID IN (4192595) THEN DELETE; ELSE ;RUN; ***/ TITLE2 "Stimulus Items"; PROC MEANS N MEAN MIN MAX MAXDEC=0; CLASS Trial_NAME; VAR Session_ID;RUN; /***NOTE: It is likely that there will be some errant blocks and stimulus items from other Tasks/studies because of miscoding; these will be eliminated in subsequent phases; ***/ /*** Step 3 - Analysis with the SAS IAT macro for scoring ***/ /*** Step 3a - Preparing the datafile for the macro ***/ DATA Raw; SET Temp; ***recoding standard Block names into variables expected by the macro; IF Block_NAME IN ('BLOCK0') THEN Block = 'B1'; ELSE IF Block_NAME IN ('BLOCK1') THEN Block = 'B2'; ELSE IF Block_NAME IN ('BLOCK2') THEN Block = 'B3'; ELSE IF Block_NAME IN ('BLOCK3') THEN Block = 'B4'; ELSE IF Block_NAME IN ('BLOCK4') THEN Block = 'B5'; ELSE IF Block_NAME IN ('BLOCK5') THEN Block = 'B6'; ELSE IF Block_NAME IN ('BLOCK6') THEN Block = 'B7'; ***extra three blocks for Tasks that used 10 Block IAT - in demo only briefly in 2000or2001; ELSE IF Block_NAME IN ('BLOCK7') THEN Block = 'B8'; ELSE IF Block_NAME IN ('BLOCK8') THEN Block = 'B9'; ELSE IF Block_NAME IN ('BLOCK9') THEN Block = 'B10'; *** only keep the variables relevant for subsequent analysis; KEEP Trial_Number Block Session_ID Study_Name Block_Pairing_Definition Trial_Latency Trial_Error Task_NAME;RUN; /*** Step 3b - Calling the SAS macro - Rules defined and summarized in the Macro explanation script. If your data are formatted/named differently from Project Implicit sites' default (e.g., if you are analyzing data from a lab study), then see the macro documentation, especially (a) and (b) - If SAS is not configured to automatically search for macros in the folder where you saved the sasMacro script (available for download from: http://projectimplicit.net/nosek/papers/scoringalgorithm.sas.txt ), then open the file, highlight and run the %IatCalc + %IatAlgorithm macro, so that it is compiled and ready for use in Steps 3b and 4. - If you're new to analyzing IAT data, double-check the SAS macro's documentation, so that your variables are in the correct (numeric, not character) format, and that the variable coding matches the expected format (e.g., Trial_Error is coded as 1 for error, 0 for correct trials). REMEMBER: This simply cleans the IAT data and calculates a D score as the performance difference average(blocks 6-3, 7-4), YOU ARE RESPONSIBLE for knowing what those blocks refer to for each participant (if you follow standard IAT practice, then you will need to reverse-code the IAT score for those who completed the stereotype-incompatible pairing in Blocks 3 & 4. See Step 7 of this IatPreparation section for details ************ ************ ************ ************ ************/ /*** Calling the macro-Variables are indicated in the following format: %iatAlgorithm(libin, libOut, indata, outdata, BLOCname, SESHID, Trial_Latency, Trial_Error, VERROR, VEXTREME, VSTD); ***/ %IatCalc(WORK, WORK, Raw, Cattitudes, Block, Session_ID, Trial_Latency, Trial_Error, 1, 2, 1); ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.IAT.03.MacroAnalysis.htm"; *** Summary of variables generated BY IAT macro; PROC MEANS;RUN; *** See summary variables BY exclusion type 0=okay, 1=too many fast, 2=incomplete data; PROC MEANS;CLASS SubExcl;RUN; /***NOTE: Do not trust the SubExcl variable for session_ids less THEN -1800000 (or so) because of error latency problem the dates and specific sessions affected BY that problem need to be identified; Any data that is missing the session_Tasks datafile needs to Identify the Task number for the IAT, i.e., was it before or after the questionnaire. ***/ /*** Step 3c - Getting other info from IAT datafile not calculated BY the SAS macro - These are specific to web datasets ***/ DATA Input; SET Raw; *pull up the Raw dataset for use and keep only the necessary variables; KEEP Trial_Number Block Block_Pairing_Definition Session_ID Study_Name Task_NAME Trial_Error Trial_Latency; PROC SORT; BY Session_ID Block; *Sort by session and Block so TRANSPOSE statement will work below; /*** Identify study name. If you have data for only 1 study this may be redundant; if so, then block pairing Definition (or for new data, the Task_ID from the SessionTask data) will be more useful ***/ DATA Cond; SET Input; WHERE Trial_Number = 1 AND Block = 'B3'; KEEP Trial_Number Session_ID Block Study_Name; PROC TRANSPOSE DATA=Cond NAME=NAME OUT=SpdMeans; BY Session_ID; VAR Study_Name; DATA Cond; SET SpdMeans(RENAME=(COL1=Study_Name)); KEEP Study_Name Session_ID; PROC SORT DATA=Cond;BY Session_ID;RUN; *outputs a file with Session_ID and Study_name; /*** Identify Block Pairing definitions so that the data doesn't get screwed up ***/ DATA Pairing; SET Input; WHERE Trial_Number = 1; KEEP Session_ID Block Block_Pairing_Definition; PROC TRANSPOSE DATA=Pairing PREFIX=P NAME=NAME OUT=Pairing(DROP=Name); BY Session_ID; VAR Block_Pairing_Definition;ID Block; /*** Outputs a file with Session_ID and the Block_Pairing_defs for all blocks ***/ PROC SORT DATA=Pairing; BY Session_ID; RUN; /*** N per Block - including extreme outliers; vars=nBLOCK0, nBLOCK, ... ; SET=ns ***/ DATA Ns; SET Input; KEEP Session_ID Block Trial_Number; PROC MEANS DATA=Ns NOPRINT; BY Session_ID Block; VAR Trial_Number; OUTPUT OUT=MEANS N=N; PROC TRANSPOSE DATA=MEANS PREFIX=N NAME=NAME OUT=SpdMeans; BY Session_ID; ID Block; PROC SORT DATA=SpdMeans; BY Session_ID; /*** Outputs a file with the number of actual trials in each Block ***/ DATA Ns; SET SpdMeans; WHERE NAME='N';RUN; /*** Step 4 - IAT Parcels; - Create four IAT "parcels" for possible use in SEM. Note that you control the Block-Pairing Order as you construct these, so you don't need to adjust them later to account for counterbalancing conditions. Just remember to make the ordering here consistent with that used for the overall IAT score. ***/ ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.IAT.04.SEM_Parcels.htm"; *** 1st TWO (OF FOUR) IAT PARCELS; DATA IatTemp1; SET Input; *** Retain only critical blocks; IF Block IN ('B3','B4','B6','B7'); *** Re-number trials 1-60 across blocks B3 and B4, and B6 and B7; IF Block IN ('B3','B6') THEN Trial_Number = Trial_Number+1; ELSE Trial_Number = Trial_Number+21; /***NOTE: The following Order of Task_name/Block_Pairing_Definitions results in scores scaled so that a positive score represents stronger stereotypical bias ***/ BlockN = 'B1'; IF Block IN ('B3','B4') AND 0 < Trial_Number < 16 THEN BlockN = 'B3'; ELSE IF Block IN ('B3','B4') AND (Trial_Number > 15 AND Trial_Number < 31) THEN BlockN = 'B4'; ELSE IF Block IN ('B6', 'B7') AND 0 < Trial_Number < 16 THEN BlockN = 'B6'; ELSE IF Block IN ('B6', 'B7') AND (Trial_Number > 15 AND Trial_Number < 31) THEN BlockN = 'B7'; IF BlockN IN ('B1') THEN DELETE; /*** %macro iatCalc(libin, libOut, indata, outdata, BLOCname, SESHID, TRLLAT, TRLERROR, VERROR, VEXTREME, VSTD); ***/ %iatCalc(WORK, WORK, IatTemp1, IatTemp2, BlockN, Session_ID, Trial_Latency, Trial_Error, 1,2, 1); DATA IatTemp2; SET IatTemp2; IatA = IAT1; IatB = IAT2; KEEP Session_ID IatA IatB; TITLE2 'Parcels A-B'; PROC MEANS; VAR IatA IatB; RUN; *** 2nd TWO (OF FOUR) IAT PARCELS; DATA IatTemp3; SET Input; *** RE-number trials 1-60 across blocks B3 and B4, and B6 and B7; IF Block IN ('B3','B6') THEN Trial_Number = Trial_Number+1; ELSE Trial_Number = Trial_Number+21; /***NOTE: The following Order of Task_name/Block_Pairing_Definitions results in scores scaled so that a positive score represents stereotypical bias ***/ BlockN = 'B1'; IF Block IN ('B3','B4') AND 30 < Trial_Number < 46 THEN BlockN = 'B3'; ELSE IF Block IN ('B3','B4') AND (Trial_Number > 45) THEN BlockN = 'B4'; ELSE IF Block IN ('B6', 'B7') AND 30 < Trial_Number < 46 THEN BlockN = 'B6'; ELSE IF Block IN ('B6', 'B7') AND (Trial_Number > 45) THEN BlockN = 'B7'; IF BlockN IN ('B1') THEN DELETE; /*** %macro iatCalc(libin, libOut, indata, outdata, BLOCname, SESHID, TRLLATNC, TRLERROR, VERROR, VEXTREME, VSTD); ***/ %iatCalc(WORK, WORK, IatTemp3, IatTemp4, BlockN, Session_ID, Trial_Latency, Trial_Error, 1, 2, 1); DATA IatTemp4; SET IatTemp4; IatC= IAT1; IatD= IAT2; KEEP Session_ID IatC IatD; TITLE2 'Parcels C & D'; PROC MEANS; VAR IatC IatD; RUN; /*** Step 5 - Merge IAT data, this is analyzed, but still contains some errant coding ***/ ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.IAT.05.Merging.Scores.Unreversed.htm"; PROC CONTENTS DATA=Cattitudes; PROC CONTENTS DATA=Cond; PROC CONTENTS DATA=Pairing; PROC CONTENTS DATA=Ns; PROC CONTENTS DATA=IatTemp2; PROC CONTENTS DATA=IatTemp4;RUN; PROC SORT DATA=Cattitudes;BY Session_ID; PROC SORT DATA=Cond;BY Session_ID; PROC SORT DATA=Pairing;BY Session_ID; PROC SORT DATA=Ns;BY Session_ID; PROC SORT DATA=IatTemp2;BY Session_ID; PROC SORT DATA=IatTemp4;BY Session_ID;RUN; /*** MERGE IAT DATA FROM ALL PARTS ABOVE ***/ DATA Raw1.IatMerged; *** Uses drop to get rid of irrelevant variables; MERGE Cattitudes(DROP=_LABEL_) Cond Pairing Ns(DROP=_LABEL_ NAME) IatTemp2 IatTemp4; *** all SAS files used above; BY Session_ID; *** MERGE on Session_ID; RUN; TITLE2 'numeric variables for Raw1.IatMerged'; *** Contents of pairing definition for first critical block; PROC MEANS MAXDEC=0;VAR Session_ID;CLASS Pb3;RUN; PROC MEANS;RUN; *** Summary of all data so far; /***NOTE: IAT data not completely cleaned (bad subjects, weird data, etc), but it is transposed, and IAT scores have been calculated ***/ /*** Step 6 - Identify and remove errant trials/participants in transposed, but non-reverse-coded IAT data ***/ ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.IAT.06.Errant.Cleanup.htm"; /*** Step6a - Slow trials and participants: - Identify how many critical trials were deleted for being too slow (>10000ms) NOTE: Normally, this should be only a small percentage of all trials; If not, check this out further, as it could be problematic. Similarly, in the following FREQ, you shouldn't see any one participant have more than a few slow trials. If so, they would need to be dropped ***/ DATA CriticalTrials; SET Raw; IF Block IN ('B3', 'B4', 'B6', 'B7'); TooSlow = 0; IF Trial_Latency > 10000 THEN TooSlow = 1; TITLE2 'CRITICAL TRIALS > 10000ms' ; PROC FREQ; TABLES TooSlow; ;RUN; /*** Identify how many participants had such slow trials. ***/ TITLE2 'SeshIDs with TooSlow'; PROC FREQ; TABLES Session_ID; WHERE TooSlow=1; RUN; /*** Calculating QuitAfter variable, to correspond to PCIAS analysis, coding when participants dropped out of IAT ***/ DATA Raw1.IatMerged; SET Raw1.IatMerged; FORMAT QuitAfter 8.; IF SubExcl NE 0 THEN DO; IF Nb1 NE . & NMiss(of Nb2-Nb7) = 6 THEN QuitAfter = 1; ELSE IF NMiss(of Nb1-Nb2) = 0 & NMiss(of nb3-nb7) = 5 THEN QuitAfter = 2; ELSE IF NMISS(of Nb1-Nb3) = 0 & NMISS(of nb4-nb7) = 4 THEN QuitAfter = 3; ELSE IF NMISS(of Nb1-Nb4) = 0 & NMISS(of nb5-nb7) = 3 THEN QuitAfter = 4; ELSE IF NMISS(of Nb1-Nb5) = 0 & NMISS(nb6, nb7) = 2 THEN QuitAfter = 5; ELSE IF NMISS(of Nb1-Nb6) = 0 & nb7 = . THEN QuitAfter = 6; ELSE; END; run; /*** Step 6b - Descriptives of different variables to identify errant coding in IAT or data that should be excluded ***/ TITLE2 'DATA conforming to standard rules'; PROC MEANS MEAN MIN MAX DATA=Raw1.IatMerged;CLASS SubExcl; *WHERE statement subsets only data that conforms to standard rules; WHERE Nb3=20 AND Nb4 = 40 AND Nb6=20 AND Nb7 = 40;RUN; /*** Identifying Block_Pairing_definitions for cleaning. NOTE: If you have very long pairing definitions you'll have to increase the $48. length in the FORMAT statement, but this length lets them print on 1 line ***/ /***OPTIONAL __________Outputting multiple PROCs to a page______________________ *** Setting it to a space causes SAS to fill each page before going to the next one ; Options formdlim=' ' ; *** Setting it to a null string resets the value of formdlim to the default, so that each new Proc will start on a new page ; Options formdlim='' ; ***/ TITLE2 'Pb3';PROC MEANS MEAN MIN MAX;CLASS Pb3;VAR IAT;FORMAT Pb3 $48.;RUN; TITLE2 'Pb4';PROC MEANS MEAN MIN MAX;CLASS Pb4;VAR IAT;FORMAT Pb4 $48.;RUN; TITLE2 'Pb6';PROC MEANS MEAN MIN MAX;CLASS Pb6;VAR IAT;FORMAT Pb6 $48.;RUN; TITLE2 'Pb7';PROC MEANS MEAN MIN MAX;CLASS Pb7;VAR IAT;FORMAT Pb7 $48.;RUN; /*** Checking for errant data recording ***/ TITLE2 'Nb1';PROC MEANS N MEAN MIN MAX;CLASS Nb1;VAR IAT;RUN; TITLE2 'Nb2';PROC MEANS MEAN MIN MAX;CLASS Nb2;VAR IAT;RUN; TITLE2 'Nb3';PROC MEANS MEAN MIN MAX;CLASS Nb3;VAR IAT;RUN; TITLE2 'Nb4';PROC MEANS MEAN MIN MAX;CLASS Nb4;VAR IAT;RUN; TITLE2 'Nb5';PROC MEANS MEAN MIN MAX;CLASS Nb5;VAR IAT;RUN; TITLE2 'Nb6';PROC MEANS MEAN MIN MAX;CLASS Nb6;VAR IAT;RUN; TITLE2 'Nb7';PROC MEANS MEAN MIN MAX;CLASS Nb7;VAR IAT;RUN; Options formdlim='' ; /*** Step 6c - Errant data clean-up: Removing problem data identified in 6a&b - Dropping tasks that should not be counted as sessions because it is test data or has clear data transfer/storage problems (i.e., other IAT pairings, as in block pairing definitions in Pb3-Pb7, or missing almost all trials, as when Nb1-Nb7 are < 8 ***/ DATA Raw1.IatMerged;SET Raw1.IatMerged; /***NOTE: If your primary interest is the explicit data, you might add "" [blank] to these in statements to retain explicit data that's missing valid IAT data (e.g., if they dropped out of the study before they finished the IAT). As this is written, it drops all participants who didn't have a "valid" IAT block pairing definition. It's necessary if you're cleaning old Demonstration-site data, as it's hard to identify phantom sessions in a long-running study. If you have a research-site study in which you're primarily interested in the explicit, then you don't care if they have valid ***/ *** testing data or data error; IF Nb3 < 8 | Nb4 < 8 | Nb6 < 8 | Nb7 < 8 | Nb1 < 8 | Nb2 < 8 | Nb5 < 8 THEN DELETE; IF Pb3 IN ("Old/Bad,Young/Good", "Young/Bad,Old/Good", "Old/Good,Young/Bad", "Young/Good,Old/Bad") THEN ; ELSE DELETE; IF Pb4 IN ("Old/Bad,Young/Good", "Young/Bad,Old/Good", "Old/Good,Young/Bad", "Young/Good,Old/Bad") THEN ; ELSE DELETE; IF Pb6 IN ("Old/Bad,Young/Good", "Young/Bad,Old/Good", "Old/Good,Young/Bad", "Young/Good,Old/Bad") THEN ; ELSE DELETE; IF Pb7 IN ("Old/Bad,Young/Good", "Young/Bad,Old/Good", "Old/Good,Young/Bad", "Young/Good,Old/Bad") THEN ; ELSE DELETE; RUN; PROC MEANS MEAN MIN MAX;CLASS SubExcl;RUN; *** summary of all data so far; /*** Step 7 - Reverse-code and run initial checks of cleaned, reverse-coded IAT scores NOTE: - Half of these trials must be reverse-scored. Because order of stereotype-congruent /stereotype-incongruent pairings is counterbalanced, B3/B4 and B6/B7 are opposite pairings. Until this is done, the IAT score will be "washed out" because of the counterbalancing. Use the pairings listed in PB3 to reverse-code the IATS below ***/ ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Prep.IAT.07.Reversed.Descriptives.htm"; DATA Raw1.IatReversed; SET Raw1.IatMerged; *** Reverse code IAT scores for counterbalancing conditions. Include pairing definition for stereotype-incongruent counterbalanced order; IF Pb3 IN ("Young/Bad,Old/Good", "Old/Good,Young/Bad") THEN DO; IAT=0-IAT; IAT1=0-IAT1; IAT2=0-IAT2; DIFF1=0-DIFF1; DIFF2=0-DIFF2; IatA=0-IatA; IatB=0-IatB; IatC=0-IatC; IatD=0-IatD; Mb3=0-Mb3; Mb4=0-Mb4; Mb6=0-Mb6; Mb7=0-Mb7; Cs1=0-Cs1; Cs2=0-Cs2; As1=0-As1; As2=0-As2;END; /*** SubExcl indicates various exclusion categories. Follow the logic of the IF-Then-Else statements, and see PCIAS (Nosek, Smyth, et al., 2007) for details ***/ *** Calculate mean errors for combined blocks; ErrMean = (Eb3*Nb3 + Eb4*Nb4 +Eb6*Nb6 + Eb7*Nb7) / (Nb3+Nb4+Nb6+Nb7); *** Calculate mean errors for practice blocks; PracErrMean = (Eb1*Nb1 + Eb2*Nb2 + Eb5*Nb5) / (Nb1+Nb2+Nb5); *** Identify possible exclusion criteria, but don't delete yet; IF ErrMean > .30 THEN SubExcl = 3; ELSE IF Eb3 > .39 | Eb4 > .39 | Eb6 > .39 | Eb7 > .39 THEN SubExcl = 4; ELSE IF PracErrMean > .39 THEN SubExcl = 5; ELSE IF Eb1 > .49 | Eb2 > .49 | Eb5 > .49 THEN SubExcl = 6; ELSE IF Fb3 > .24 | Fb4 > .24 | Fb6 > .24 | Fb7 > .24 THEN SubExcl = 7; ELSE IF Fb1 > .34 | Fb2 > .34 | Fb5 > .34 THEN SubExcl = 8; *** Not clear if this last one should be used - reliability among parcels is high; ELSE; RUN; DATA Clean.IatReversed;SET Raw1.IatReversed; *** 15 valid responses per block should be sufficient for generating IAT ; IF SubExcl = 2 & Nb3 >= 15 & Nb4 >= 15 & Nb6 >= 15 & Nb7 >= 15 THEN SubExcl = 0; *** but delete if less than 15 in any one critical block; ELSE IF (017))); TABLES Gender Edu EthnicityOMB RaceOMB Politics CountryRes / MISSING; FORMAT Edu SimpleEduFmt. Race RaceFmt.; RUN; PROC MEANS DATA=Clean.Merged(WHERE =(Consenter = 1 & (Age=. | Age>17))); VAR Age; RUN; /***NOTE: See create_common_dir._format-label_PI_vars.sas (on the wiki) for variable formatting ***/ PROC FORMAT; VALUE SubExclFmt (NOTSORTED) 0= "Good Data" 1="10%+ fast Ts, avging all critical blocks" 7=">24% fast Ts, 1 critical block" 8=">34% fast Ts, 1 practice block" 3=">30% avg errors across all crit blocks" 4=">39% errors in 1 critical block" 5=">39% error rate, all practice blocks" 6=">49% error in 1 practice block" 2="missing data or >39% error rate, crit block" 9=">missing trials, critical block"; RUN; /*** Subsetting: OrdIat = Began Iat (from Clean.Order ) & FullScale = have valid responses on the explicit DV ***/ PROC MEANS N NONOBS MAXDEC=0 DATA=Clean.Merged(WHERE=(OrdIAT NE . & NMiss(att,TYoung,TOld) < 3))); VAR Iat ;CLASS SubExcl;FORMAT SubExcl SubExclFmt.; RUN; /*** Step 2 - Drop IAT scores for those meeting exclusion criteria, reverse code or do other basic-level cleaning of the data - This drops all IAT data that fails to conform to reasonable standards of errors, speed latencies, etc. See subexcl coding in Step 6 of IAT Prep Section, or Nosek, Smyth, et al., 2007 ***/ ODS HTML CLOSE; ODS HTML PATH=WebOut FILE="Merged.02.Other.Cleaning.htm"; /***NOTE: Anything odd with sessiondate? ***/ PROC FREQ;TABLES Session_Date; FORMAT Session_Date DTMONYY7.; RUN; /***NOTE: Finish any frequency outputs to determine N of participants assigned to study / with invalid IAT data, etc. before running this next step. ***/ DATA Clean.Age; SET Clean.Merged; IF SubExcl NE 0 THEN DO; IAT = .;IAT1=.; IAT2=.; IatA=.; IatB=.; IatC=.;IatD=.; Diff1=.; Diff2=.; END; RUN; PROC CONTENTS DATA=Clean.Age;RUN; PROC UNIVARIATE DATA=Clean.Age;VAR Age;RUN; /* Quantile Estimate 100% MAX 87 99% 64 95% 54 90% 48 75% Q3 32 50% Median 22 25% Q1 18 10% 16 5% 15 1% 13 0% Min 7 */ /***NOTE: Doing other cleaning of important explicit/demographic variables. For this dataset, The range of att changes from 1-5 to 1-7 sometime in September, 2006. Let's check when the change occurs, varying the format that groups session_date. dtmonyy7. = By month/year datetime7. = By individual day/month/year ***/ /*** PROC FREQ;TABLES Session_Date*att / nocol norow nocum nopercent; FORMAT Session_Date dateampm10. ; WHERE "15Sep2006:00:00:00"dt < Session_Date < "17Sep2006:00:00:00"dt;RUN; *It looks like the changeover began at the beginning of the day on September 16th ; ***/ PROC CORR DATA=Clean.Age;VAR Iat att; WHERE Session_Date < "16Sep2006:00:00:00"dt ;RUN; PROC CORR DATA=Clean.Age;VAR Iat att; WHERE Session_Date > "16Sep2006:00:00:00"dt ;RUN; proc contents data=clean.age;run; PROC means n min max maxdec=0 DATA=Clean.Age;VAR session_id creation_date; WHERE NMISS(ctoya, atoma, matoo, yatoa) < 4;RUN; /*** Variables that were only in the newer study with Att7, not Att5 ***/ PROC means n min max maxdec=0 DATA=Clean.Age;VAR session_id creation_date; WHERE NMISS(choosetobe, feel, hopetolive, othersthink) < 4;RUN; /***EXAMPLE: Here's an example analysis using contrasts. This has already created a dataset containing only US citizens, and uses the values of the variable for race/ethnicity (see wiki for details of change ) ***/ PROC GLM DATA=USCitizens; CLASS Race; MODEL Iat=Race / SS3; CONTRAST 'Native Americans compared to Whites' Race 1 0 0 0 -1 0 0 0; /* CONTRAST 'Asians compared to Whites' Race 0 1 0 0 -1 0 0 0; */ CONTRAST 'Blacks compared to Whites' Race 0 0 1 0 -1 0 0 0; /* CONTRAST 'Hispanics compared to Whites' Race 0 0 0 1 -1 0 0 0; */ CONTRAST 'Other compared to Whites' Race 0 0 0 0 -1 1 0 0; CONTRAST 'Multi, Black & White, compared to Whites' Race 0 0 0 0 -1 0 1 0; CONTRAST 'Multi, Other, compared to Whites' Race 0 0 0 0 -1 0 0 1;RUN;