* SPSS syntax for calculating the Wilson score confidence interval for a single proportion.

* Dr. M. van Leerdam 
* Bare Statistics 2019 (www.barestatistics.nl)

* Method described in:

* Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion (comparison of seven methods). Statistics in Medicine, 17, 857-872.
* Newcombe, R. G. (2013). Confidence Intervals for Proportions and Related Measures of Effect Size. Boca Raton: CRC Press.
* Wilson, E. B. (1927). Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association, 22, 209-212.

* First, a copy is made of your data file.

DATASET NAME Original.
DATASET COPY Copy.
DATASET ACTIVATE Copy.
DATASET CLOSE Original.

* Now, replace "MYVARIABLE" by the new variable name "Outcome".

RENAME VARIABLES MYVARIABLE = Outcome.

FREQUENCIES Outcome.

* From here, you can choose Run All from the menu!

* Outcome variable must be binary coded (1, 0). 
* If this is not the case (for example if the variable is coded 2 and 1), the syntax below will code the larger value 1 and the lower value 0.
* For reversed coding, switch (ELSE=0) and (ELSE=1).

AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /Outcome_min=MIN(Outcome) /Outcome_max=MAX(Outcome).
DO IF (Outcome = Outcome_min).
RECODE Outcome (ELSE=0).
END IF.
DO IF (Outcome = Outcome_max).
RECODE Outcome (ELSE=1).
END IF.
VARIABLE LEVEL Outcome (NOMINAL).
VARIABLE LABELS Outcome 'Outcome'.
FORMATS Outcome (F8.0).
EXECUTE.
DELETE VARIABLES Outcome_min Outcome_max.

* Calculate proportion.

AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /S_1=SUM(Outcome) /N_1=NU(Outcome). 
VARIABLE LABELS N_1 'Total sample size'.
COMPUTE F_1 = N_1 - S_1.
COMPUTE P_1 = S_1 / N_1 . 
FORMATS S_1 F_1 N_1 (F8.0). 
FORMATS P_1 (F8.3). 
VARIABLE LABELS S_1 'Number of times that the event has occurred'.
VARIABLE LABELS F_1 'Number of times that the event has not occurred'.
VARIABLE LABELS P_1 'Proportion'.
EXECUTE.

* Choose confidence level (C). The default cumulative probability (p) is .025 for 95% confidence.
* C = 99%: p = .005; C = 95%: p = .025; C = 90%: p = .050; C = 80%: p = .100; C = 50%: p = .250.

COMPUTE Z = ABS(IDF.NORMAL(.025,0,1)). 

* Calculate lower (LL) and upper (UL) confidence limits.

COMPUTE C = 100 * ((2 * CDF.NORMAL(Z,0,1)) -1).
FORMATS C (F8.0).
ALTER TYPE C (A2).
STRING Cstring (A3).
COMPUTE Cstring = CONCAT(C,"%").
VARIABLE LABELS Cstring 'Confidence Level'.
COMPUTE LL = (((2 * S_1 + Z**2) - (Z * SQRT((Z**2 + 4 * S_1 * (1 - P_1)))))) / (2 * (N_1 + Z**2)). 
COMPUTE UL = (((2 * S_1 + Z**2) + (Z * SQRT((Z**2 + 4 * S_1 * (1 - P_1)))))) / (2 * (N_1 + Z**2)). 
VARIABLE LABELS LL 'Lower Limit'.
VARIABLE LABELS UL 'Upper Limit'.
FORMATS LL UL (F8.3). 
EXECUTE.

* Table with counts, proportion, and confidence limits.

OMS /SELECT TABLES /IF COMMANDS=['Summarize'] SUBTYPES=['Case Processing Summary'] /DESTINATION VIEWER=NO.
USE 1 thru 1.
SUMMARIZE
  /TABLES=S_1 F_1 N_1 P_1 Cstring LL UL /FORMAT=VALIDLIST NOCASENUM Total
  /TITLE='Wilson score confidence interval for single proportion'
  /MISSING=VARIABLE /CELLS=NONE.
DELETE VARIABLES S_1 TO UL.
OMSEND.
USE ALL.