# Standard Operating Procedure: Psychometric Validation of Clinical Outcome Measures | Document ID | SOP-VAL-001 | |-------------|---------| | Title | Psychometric Validation of Clinical Outcome Measures | | Revision | 1.0 | | Effective Date | [DATE] | | Author | [AUTHOR] | | Approved By | [APPROVER] | | Department | Outcomes Research | --- ## 1. Purpose This procedure establishes requirements for conducting psychometric validation studies of clinical outcome measures to ensure they demonstrate appropriate measurement properties for their intended use. ## 2. Scope This procedure applies to: - New outcome measure development - Validation of existing measures in new populations - Adaptation of measures for new contexts or modes of administration - All outcome measure types (PRO, ClinRO, ObsRO, PerfO) ## 3. Responsibilities ### 3.1 Principal Investigator/Measure Developer - Design validation study protocol - Ensure appropriate statistical expertise - Review and interpret validation results - Document validation evidence ### 3.2 Biostatistician - Develop statistical analysis plan - Conduct psychometric analyses - Generate validation reports - Advise on sample size and methodology ### 3.3 Quality Manager - Review validation protocols for regulatory compliance - Maintain validation documentation - Track validation status of all measures ## 4. Definitions | Term | Definition | |------|------------| | Reliability | The degree to which a measure is free from measurement error | | Internal Consistency | The extent to which items within a scale measure the same construct (Cronbach's alpha) | | Test-Retest Reliability | Consistency of scores when measure is administered to the same individuals at different times | | Inter-Rater Reliability | Agreement between different raters/observers (for ClinRO, ObsRO) | | Validity | The degree to which a measure assesses what it purports to measure | | Content Validity | Evidence that measure items represent all aspects of the construct | | Construct Validity | Evidence that measure relates to other measures as theoretically expected | | Criterion Validity | Agreement between measure and a gold standard | | Responsiveness | Ability to detect meaningful change over time | | MCID | Minimal Clinically Important Difference - smallest change considered important | | Floor/Ceiling Effects | Clustering of scores at bottom or top of scale, limiting ability to detect change | ## 5. Procedure ### 5.1 Validation Study Planning 5.1.1. Define validation objectives: - Target population - Intended use and context - Mode of administration (paper, electronic, interview) - Key measurement properties to evaluate 5.1.2. Develop validation protocol including: - Background and rationale - Study design and timeline - Participant eligibility criteria - Sample size justification - Data collection procedures - Statistical analysis plan - Success criteria for validation 5.1.3. Select comparison measures: - Established measures of same construct (convergent validity) - Measures of different constructs (discriminant validity) - Clinical indicators or gold standards (criterion validity) 5.1.4. Obtain necessary regulatory approvals (IRB, informed consent) 5.1.5. Document validation plan in Form FRM-VAL-001 ### 5.2 Reliability Assessment #### 5.2.1 Internal Consistency Reliability 5.2.1.1. Analyze baseline data from main study sample 5.2.1.2. Calculate Cronbach's alpha for each scale/subscale 5.2.1.3. Acceptance criteria: - Alpha ≥ 0.70 for group comparisons - Alpha ≥ 0.90 for individual decision-making - Alpha < 0.95 (if higher, may indicate item redundancy) 5.2.1.4. Examine item-total correlations (typically ≥ 0.30) 5.2.1.5. Assess scale dimensionality using factor analysis #### 5.2.2 Test-Retest Reliability 5.2.2.1. Administer measure twice to stable subsample 5.2.2.2. Time interval: typically 2-14 days - Short enough that true change is unlikely - Long enough to prevent memory effects 5.2.2.3. Calculate intraclass correlation coefficient (ICC) 5.2.2.4. Acceptance criteria: - ICC ≥ 0.70 for group comparisons - ICC ≥ 0.90 for individual decision-making 5.2.2.5. Calculate standard error of measurement (SEM) 5.2.2.6. Generate Bland-Altman plots to assess agreement #### 5.2.3 Inter-Rater Reliability (for ClinRO, ObsRO) 5.2.3.1. Have multiple raters assess same participants 5.2.3.2. Calculate ICC or weighted kappa as appropriate 5.2.3.3. Acceptance criteria: - ICC or kappa ≥ 0.70 5.2.3.4. Identify sources of disagreement for training improvement ### 5.3 Validity Assessment #### 5.3.1 Content Validity 5.3.1.1. Conduct qualitative research with target population: - Concept elicitation interviews - Cognitive debriefing of items - Assessment of comprehensibility and relevance 5.3.1.2. Obtain expert panel review: - Clinical experts - Psychometricians - Patient representatives 5.3.1.3. Document evidence in content validity report 5.3.1.4. For FDA submissions, follow FDA PRO Guidance requirements #### 5.3.2 Construct Validity 5.3.2.1. Convergent validity: - Correlate with established measures of same construct - Expected correlation: typically r ≥ 0.50-0.70 5.3.2.2. Discriminant validity: - Correlate with measures of different constructs - Expected correlation: typically r < 0.30 5.3.2.3. Known-groups validity: - Compare scores across groups expected to differ - Use appropriate statistical tests (t-test, ANOVA) - Calculate effect sizes (Cohen's d, eta-squared) 5.3.2.4. Factorial validity: - Conduct confirmatory factor analysis (CFA) - Assess model fit (CFI > 0.90, RMSEA < 0.08, SRMR < 0.08) #### 5.3.3 Criterion Validity 5.3.3.1. If gold standard exists, calculate: - Sensitivity and specificity - Positive and negative predictive values - ROC curves and AUC ### 5.4 Responsiveness Assessment 5.4.1. Collect data at baseline and follow-up from participants expected to change 5.4.2. Calculate change scores 5.4.3. Assess responsiveness using: - Effect sizes (Cohen's d, standardized response mean) - Correlation with external indicators of change - Receiver operating characteristic (ROC) analysis 5.4.4. Determine Minimal Clinically Important Difference (MCID): - Anchor-based methods (correlation with patient global ratings) - Distribution-based methods (0.5 SD, 1 SEM) - Multiple methods recommended ### 5.5 Interpretability Assessment 5.5.1. Assess score distribution: - Floor effects: >15% scoring at minimum - Ceiling effects: >15% scoring at maximum - Skewness and kurtosis 5.5.2. Develop score interpretation guidelines: - Clinical cutoff scores - Severity categories - Normative data (if appropriate) 5.5.3. Document MCID and other interpretability anchors ### 5.6 Validation Report 5.6.1. Prepare comprehensive validation report including: - Study objectives and methods - Participant characteristics - All psychometric analyses results - Tables and figures - Discussion of strengths and limitations - Conclusions and recommendations for use 5.6.2. File validation report as Form FRM-VAL-002 5.6.3. Update measure status in Validation Tracking Database 5.6.4. For regulatory submissions, prepare according to FDA guidance ### 5.7 Ongoing Validation Activities 5.7.1. Plan for continued evidence generation: - Validation in additional populations - Assessment in different contexts or settings - Cross-cultural validation - Longitudinal measurement invariance 5.7.2. Monitor published validation evidence for measures in use 5.7.3. Review and update validation status annually ## 6. Related Documents - FRM-VAL-001: Validation Study Protocol Template - FRM-VAL-002: Psychometric Validation Report Template - FRM-VAL-003: Validation Tracking Database - SOP-DM-001: Data Management for Validation Studies - SOP-LIC-001: License Management ## 7. References - FDA (2009). Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims - Mokkink LB, et al. (2010). The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments. Quality of Life Research, 19(4), 539-549 - Reeve BB, et al. (2013). ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Quality of Life Research, 22(8), 1889-1905 - Streiner DL, Norman GR, Cairney J (2015). Health Measurement Scales: A Practical Guide to Their Development and Use (5th ed.). Oxford University Press - DeVellis RF (2017). Scale Development: Theory and Applications (4th ed.). SAGE Publications --- ## Revision History | Rev | Date | Description | Author | |-----|------|-------------|--------| | 1.0 | [DATE] | Initial release | [AUTHOR] |