Files
measure-repository/SOPs/Validation/SOP-VAL-001-Psychometric-Validation.md

8.8 KiB

Standard Operating Procedure: Psychometric Validation of Clinical Outcome Measures

Document ID SOP-VAL-001
Title Psychometric Validation of Clinical Outcome Measures
Revision 1.0
Effective Date [DATE]
Author [AUTHOR]
Approved By [APPROVER]
Department Outcomes Research

1. Purpose

This procedure establishes requirements for conducting psychometric validation studies of clinical outcome measures to ensure they demonstrate appropriate measurement properties for their intended use.

2. Scope

This procedure applies to:

  • New outcome measure development
  • Validation of existing measures in new populations
  • Adaptation of measures for new contexts or modes of administration
  • All outcome measure types (PRO, ClinRO, ObsRO, PerfO)

3. Responsibilities

3.1 Principal Investigator/Measure Developer

  • Design validation study protocol
  • Ensure appropriate statistical expertise
  • Review and interpret validation results
  • Document validation evidence

3.2 Biostatistician

  • Develop statistical analysis plan
  • Conduct psychometric analyses
  • Generate validation reports
  • Advise on sample size and methodology

3.3 Quality Manager

  • Review validation protocols for regulatory compliance
  • Maintain validation documentation
  • Track validation status of all measures

4. Definitions

Term Definition
Reliability The degree to which a measure is free from measurement error
Internal Consistency The extent to which items within a scale measure the same construct (Cronbach's alpha)
Test-Retest Reliability Consistency of scores when measure is administered to the same individuals at different times
Inter-Rater Reliability Agreement between different raters/observers (for ClinRO, ObsRO)
Validity The degree to which a measure assesses what it purports to measure
Content Validity Evidence that measure items represent all aspects of the construct
Construct Validity Evidence that measure relates to other measures as theoretically expected
Criterion Validity Agreement between measure and a gold standard
Responsiveness Ability to detect meaningful change over time
MCID Minimal Clinically Important Difference - smallest change considered important
Floor/Ceiling Effects Clustering of scores at bottom or top of scale, limiting ability to detect change

5. Procedure

5.1 Validation Study Planning

5.1.1. Define validation objectives:

  • Target population
  • Intended use and context
  • Mode of administration (paper, electronic, interview)
  • Key measurement properties to evaluate

5.1.2. Develop validation protocol including:

  • Background and rationale
  • Study design and timeline
  • Participant eligibility criteria
  • Sample size justification
  • Data collection procedures
  • Statistical analysis plan
  • Success criteria for validation

5.1.3. Select comparison measures:

  • Established measures of same construct (convergent validity)
  • Measures of different constructs (discriminant validity)
  • Clinical indicators or gold standards (criterion validity)

5.1.4. Obtain necessary regulatory approvals (IRB, informed consent)

5.1.5. Document validation plan in Form FRM-VAL-001

5.2 Reliability Assessment

5.2.1 Internal Consistency Reliability

5.2.1.1. Analyze baseline data from main study sample

5.2.1.2. Calculate Cronbach's alpha for each scale/subscale

5.2.1.3. Acceptance criteria:

  • Alpha ≥ 0.70 for group comparisons
  • Alpha ≥ 0.90 for individual decision-making
  • Alpha < 0.95 (if higher, may indicate item redundancy)

5.2.1.4. Examine item-total correlations (typically ≥ 0.30)

5.2.1.5. Assess scale dimensionality using factor analysis

5.2.2 Test-Retest Reliability

5.2.2.1. Administer measure twice to stable subsample

5.2.2.2. Time interval: typically 2-14 days

  • Short enough that true change is unlikely
  • Long enough to prevent memory effects

5.2.2.3. Calculate intraclass correlation coefficient (ICC)

5.2.2.4. Acceptance criteria:

  • ICC ≥ 0.70 for group comparisons
  • ICC ≥ 0.90 for individual decision-making

5.2.2.5. Calculate standard error of measurement (SEM)

5.2.2.6. Generate Bland-Altman plots to assess agreement

5.2.3 Inter-Rater Reliability (for ClinRO, ObsRO)

5.2.3.1. Have multiple raters assess same participants

5.2.3.2. Calculate ICC or weighted kappa as appropriate

5.2.3.3. Acceptance criteria:

  • ICC or kappa ≥ 0.70

5.2.3.4. Identify sources of disagreement for training improvement

5.3 Validity Assessment

5.3.1 Content Validity

5.3.1.1. Conduct qualitative research with target population:

  • Concept elicitation interviews
  • Cognitive debriefing of items
  • Assessment of comprehensibility and relevance

5.3.1.2. Obtain expert panel review:

  • Clinical experts
  • Psychometricians
  • Patient representatives

5.3.1.3. Document evidence in content validity report

5.3.1.4. For FDA submissions, follow FDA PRO Guidance requirements

5.3.2 Construct Validity

5.3.2.1. Convergent validity:

  • Correlate with established measures of same construct
  • Expected correlation: typically r ≥ 0.50-0.70

5.3.2.2. Discriminant validity:

  • Correlate with measures of different constructs
  • Expected correlation: typically r < 0.30

5.3.2.3. Known-groups validity:

  • Compare scores across groups expected to differ
  • Use appropriate statistical tests (t-test, ANOVA)
  • Calculate effect sizes (Cohen's d, eta-squared)

5.3.2.4. Factorial validity:

  • Conduct confirmatory factor analysis (CFA)
  • Assess model fit (CFI > 0.90, RMSEA < 0.08, SRMR < 0.08)

5.3.3 Criterion Validity

5.3.3.1. If gold standard exists, calculate:

  • Sensitivity and specificity
  • Positive and negative predictive values
  • ROC curves and AUC

5.4 Responsiveness Assessment

5.4.1. Collect data at baseline and follow-up from participants expected to change

5.4.2. Calculate change scores

5.4.3. Assess responsiveness using:

  • Effect sizes (Cohen's d, standardized response mean)
  • Correlation with external indicators of change
  • Receiver operating characteristic (ROC) analysis

5.4.4. Determine Minimal Clinically Important Difference (MCID):

  • Anchor-based methods (correlation with patient global ratings)
  • Distribution-based methods (0.5 SD, 1 SEM)
  • Multiple methods recommended

5.5 Interpretability Assessment

5.5.1. Assess score distribution:

  • Floor effects: >15% scoring at minimum
  • Ceiling effects: >15% scoring at maximum
  • Skewness and kurtosis

5.5.2. Develop score interpretation guidelines:

  • Clinical cutoff scores
  • Severity categories
  • Normative data (if appropriate)

5.5.3. Document MCID and other interpretability anchors

5.6 Validation Report

5.6.1. Prepare comprehensive validation report including:

  • Study objectives and methods
  • Participant characteristics
  • All psychometric analyses results
  • Tables and figures
  • Discussion of strengths and limitations
  • Conclusions and recommendations for use

5.6.2. File validation report as Form FRM-VAL-002

5.6.3. Update measure status in Validation Tracking Database

5.6.4. For regulatory submissions, prepare according to FDA guidance

5.7 Ongoing Validation Activities

5.7.1. Plan for continued evidence generation:

  • Validation in additional populations
  • Assessment in different contexts or settings
  • Cross-cultural validation
  • Longitudinal measurement invariance

5.7.2. Monitor published validation evidence for measures in use

5.7.3. Review and update validation status annually

  • FRM-VAL-001: Validation Study Protocol Template
  • FRM-VAL-002: Psychometric Validation Report Template
  • FRM-VAL-003: Validation Tracking Database
  • SOP-DM-001: Data Management for Validation Studies
  • SOP-LIC-001: License Management

7. References

  • FDA (2009). Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims
  • Mokkink LB, et al. (2010). The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments. Quality of Life Research, 19(4), 539-549
  • Reeve BB, et al. (2013). ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Quality of Life Research, 22(8), 1889-1905
  • Streiner DL, Norman GR, Cairney J (2015). Health Measurement Scales: A Practical Guide to Their Development and Use (5th ed.). Oxford University Press
  • DeVellis RF (2017). Scale Development: Theory and Applications (4th ed.). SAGE Publications

Revision History

Rev Date Description Author
1.0 [DATE] Initial release [AUTHOR]