14. Analysis methods
What to write
Methods for estimating or comparing measures of diagnostic accuracy.
Explanation
Multiple measures of diagnostic accuracy exist to describe the performance of a medical test, and their calculation from the collected data is not always straightforward.1 Authors should report the methods used for calculating the measures that they considered appropriate for their study objectives.
Statistical techniques can be used to test specific hypotheses, following from the study's objectives. In single-test evaluations, authors may want to evaluate if the diagnostic accuracy of the tests exceeds a prespecified level (eg, sensitivity of at least 95%, see Item 4).
Diagnostic accuracy studies can also compare two or more index tests. In such comparisons, statistical hypothesis testing usually involves assessing the superiority of one test over another, or the non-inferiority.2 For such comparisons, authors should indicate what measure they specified to make the comparison; these should match their study objectives, and the purpose and role of the index test relative to the clinical pathway. Examples are the relative sensitivity, the absolute gain in sensitivity and the relative diagnostic OR.3
In the example, the authors used McNemar's test statistic to evaluate whether the sensitivity and specificity of stereoscopic digital mammography differed from that of digital mammography in patients with elevated risk for breast cancer. In itself, the resulting p value is not a quantitative expression of the relative accuracy of the two investigated tests. Like any p value, it is influenced by the magnitude of the difference in effect and the sample size. In the example, the authors could have calculated the relative or absolute difference in sensitivity and specificity, including a 95% CI that takes into account the paired nature of the data.
Example
‘Statistical tests of sensitivity and specificity were conducted by using the McNemar test for correlated proportions. All tests were two sided, testing the hypothesis that stereoscopic digital mammography performance differed from that of digital mammography. A p-value of 0.05 was considered as the threshold for significance’.4
Training
The UK EQUATOR Centre runs training on how to write using reporting guidelines.
Discuss this item
Visit this items’ discussion page to ask questions and give feedback.