Evaluating linguistic equivalence of patient-reported outcomes in a cancer clinical trialCenter on Outcomes, Research and Education (CORE), Evanston Northwestern Healthcare, 1001 University Place, Suite 100, Evanston, Illinois 60201, USA; e-hahn{at}northwestern.edu; Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
Center for Rehabilitation Outcomes Research, Rehabilitation Institute of Chicago; Department of Physical Medicine and Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
Center on Outcomes, Research and Education, Evanston Northwestern Healthcare, Evanston, Illinois, USA
Center on Outcomes, Research and Education, Evanston Northwestern Healthcare, Evanston, Illinois, USA; Department of Psychiatry and Behavioral Sciences, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA Background In order to make meaningful cross-cultural or cross-linguistic comparisons of health-related quality of life (HRQL) or to pool international research data, it is essential to create unbiased measures that can detect clinically important differences. When HRQL scores differ between cultural/linguistic groups, it is important to determine whether this reflects real group differences, or is the result of systematic measurement variability. Purpose To investigate the linguistic measurement equivalence of a cancer-specific HRQL questionnaire, and to conduct a sensitivity analysis of treatment differences in HRQL in a clinical trial. Methods Patients with newly diagnosed chronic myelogenous leukemia (n = 1049) completed serial HRQL assessments in an international Phase III trial. Two types of differential item functioning (uniform and non-uniform) were evaluated using item response theory and classical test theory approaches. A sensitivity analysis was conducted to compare HRQL between treatment arms using items without evidence of differential functioning. Results Among 27 items, nine (33%) did not exhibit any evidence of differential functioning in both linguistic comparisons (English versus French, English versus German). Although 18 items functioned differently, there was no evidence of systematic bias. In a sensitivity analysis, adjustment for differential functioning affected the magnitude, but not the direction or interpretation of clinical trial treatment arm differences. Limitations Sufficient sample sizes were available for only three of the eight language groups. Identification of differential functioning in two-thirds of the items suggests that current psychometric methods may be too sensitive. Conclusions Enhanced methodologies are needed to differentiate trivial from substantive differential item functioning. Systematic variability in HRQL across different groups can be evaluated for its effect upon clinical trial results; a practice recommended when data are pooled across cultural or linguistic groups to make conclusions about treatment effects.
Clinical Trials, Vol. 3, No. 3,
280-290 (2006) This article has been cited by other articles:
|
||||||||||||||||
