* SPSS syntax for calculating the Newcombe-Wilson hybrid score confidence interval 
* for the difference between independent proportions.

* Dr. M. van Leerdam 
* Bare Statistics 2019 (www.barestatistics.nl)

* Method described in:

* Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion (comparison of seven methods). Statistics in Medicine, 17, 857-872.
* Newcombe, R. G. (2013). Confidence Intervals for Proportions and Related Measures of Effect Size. Boca Raton: CRC Press.

* First, a copy is made of your data file.

DATASET NAME Original.
DATASET COPY Copy.
DATASET ACTIVATE Copy.
DATASET CLOSE Original.

* Now, replace the names of your independent and dependent variables by the new variable names "IV" and "DV".

RENAME VARIABLES MYINDEPENDENTVARIABLE = IV.
RENAME VARIABLES MYDEPENDENTVARIABLE = DV.

FREQUENCIES IV DV.

* From here, you can choose Run All from the menu!

* DV must be binary coded (1, 0). 
* If this is not the case (for example if the variable is coded 2 and 1), the syntax below will code the larger value 1 and the lower value 0.
* For reversed coding, switch (ELSE=0) and (ELSE=1).

AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /DV_min=MIN(DV) /DV_max=MAX(DV).
DO IF (DV = DV_min).
RECODE DV (ELSE=0).
END IF.
DO IF (DV = DV_max).
RECODE DV (ELSE=1).
END IF.
VARIABLE LEVEL DV (NOMINAL).
VARIABLE LABELS DV 'Outcome'.
FORMATS DV (F8.0).
EXECUTE.
DELETE VARIABLES DV_min DV_max.

* IV must have values (1, 2).
* If this is not the case (for example if the variable is coded 1 and 0), the syntax below will code the larger value 2 and the lower value 1.

RENAME VARIABLES IV=IV_original.
AUTORECODE VARIABLES=IV_original /INTO IV.
FORMATS  IV (F15.0).

* For reversed coding, use: AUTORECODE VARIABLES=IV_original /INTO IV /DESCENDING.

* Calculate proportions.

COMPUTE filter_$=(IV = 1).
FILTER BY filter_$.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=
  /S_1=SUM(DV) 
  /N_1=NU(DV).
COMPUTE filter_$=(IV = 2).
FILTER BY filter_$.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=
  /S_2=SUM(DV) 
  /N_2=NU(DV).
FILTER OFF.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITEVARS=YES
  /BREAK=
  /S_1=MEAN(S_1)
  /N_1=MEAN(N_1)
  /S_2=MEAN(S_2)
  /N_2=MEAN(N_2).
DELETE VARIABLES filter_$.
COMPUTE P_1 = S_1 / N_1 . 
COMPUTE P_2 = S_2 / N_2 . 
FORMATS S_1 S_2 N_1 N_2 (F8.0). 
FORMATS P_1 P_2 (F8.3). 
VARIABLE LABELS P_1 'Proportion Group 1'.
VARIABLE LABELS P_2 'Proportion Group2'.
VARIABLE LABELS N_1 'Sample size Group1'.
VARIABLE LABELS N_2 'Sample size Group2'.
EXECUTE.

* Choose confidence level (C). The default cumulative probability (p) is .025 for 95% confidence.
* C = 99%: p = .005; C = 95%: p = .025; C = 90%: p = .050; C = 80%: p = .100; C = 50%: p = .250.

COMPUTE Z = ABS(IDF.NORMAL(.025,0,1)). 

* Calculate lower (LL) and upper (UL) confidence limits.

COMPUTE C = 100 * ((2 * CDF.NORMAL(Z,0,1)) -1).
FORMATS C (F8.0).
ALTER TYPE C (A2).
STRING Cstring (A3).
COMPUTE Cstring = CONCAT(C,"%").
VARIABLE LABELS Cstring 'Confidence Level'.
COMPUTE LL_1 = (((2 * S_1 + Z**2) - (Z * SQRT((Z**2 + 4 * S_1 * (1 - P_1)))))) / (2 * (N_1 + Z**2)) . 
COMPUTE UL_1 = (((2 * S_1 + Z**2) + (Z * SQRT((Z**2 + 4 * S_1 * (1 - P_1)))))) / (2 * (N_1 + Z**2)). 
COMPUTE LL_2 = (((2 * S_2 + Z**2) - (Z * SQRT((Z**2 + 4 * S_2 * (1 - P_2)))))) / (2 * (N_2 + Z**2)) . 
COMPUTE UL_2 = (((2 * S_2 + Z**2) + (Z * SQRT((Z**2 + 4 * S_2 * (1 - P_2)))))) / (2 * (N_2 + Z**2)). 
COMPUTE D = P_1 - P_2.
COMPUTE LL = P_1 - P_2 - SQRT((P_1 - LL_1)**2 + (UL_2 - P_2)**2).
COMPUTE UL = P_1 - P_2 + SQRT((UL_1 - P_1)**2 + (P_2 - LL_2)**2).
FORMATS Z P_1 P_2 LL_1 UL_1 LL_2 UL_2 D LL UL (F8.3). 
VARIABLE LABELS D 'Difference between proportions'.
VARIABLE LABELS LL_1 'Lower Limit'.
VARIABLE LABELS LL_2 'Lower Limit'.
VARIABLE LABELS LL 'Lower Limit'.
VARIABLE LABELS UL_1 'Upper Limit'.
VARIABLE LABELS UL_2 'Upper Limit'.
VARIABLE LABELS UL 'Upper Limit'.
EXECUTE.
OUTPUT CLOSE *.

* Table with counts, proportions, and confidence limits.

OMS
  /SELECT TABLES
  /IF COMMANDS=['Summarize'] SUBTYPES=['Case Processing Summary']
  /DESTINATION VIEWER=NO.
USE 1 thru 1.
SUMMARIZE
  /TABLES=Cstring N_1 P_1 LL_1 UL_1 P_2 N_2 LL_2 UL_2
  /FORMAT=VALIDLIST NOCASENUM Total
  /TITLE='Wilson score confidence intervals for single proportions'
  /MISSING=VARIABLE
  /CELLS=NONE.
SUMMARIZE
  /TABLES=D Cstring LL UL
  /FORMAT=VALIDLIST NOCASENUM Total
  /TITLE='Newcombe-Wilson hybrid score confidence interval for difference between independent proportions'
  /MISSING=VARIABLE
  /CELLS=NONE.
OMSEND.
USE ALL.
CROSSTABS
  /TABLES=DV BY IV
  /FORMAT=AVALUE TABLES
  /STATISTICS=CHISQ 
  /CELLS=COUNT COLUMN 
  /COUNT ROUND CELL.
DELETE VARIABLES IV TO UL.
RENAME VARIABLES IV_original=IV.