The STARD reporting guideline for writing diagnostic accuracy study articles

The STARD reporting guideline helps authors write up diagnostic accuracy studies that can be understood and used by a wide audience. This page summarises STARD and how it can be used. Each guideline item links to more information, examples, and relevant training.

STARD: Standards for Reporting Diagnostic Accuracy

Version: STARD 2015 v1.1. This is the latest version ✅

How to use this reporting guideline

You can use reporting guidelines throughout your research process.

  • When writing: Consider using a writing guide to draft your manuscript or protocol.
  • After writing: Complete a checklist and include it with your journal submission.
  • To learn: Consult the guidance whenever you need it.

However you use STARD, please cite it.

Applicability criteria

You can use STARD if you are writing up a study which evaluates a diagnostic test against a clinical reference standard, or a gold standard.

You can also it to:

  • write a proposal or protocol for a diagnostic accuracy study (use the items within the Introduction and Method sections).
  • review the reporting of a diagnostic accuracy study article, but not to appraise the quality of its design or conduct.

Do not use STARD for appraising the quality of a diagnostic accuracy study. Use an appraisal tool like CASP Diagnostic Study Checklist or QUADAS2 instead.

Summary of guidance

Although you should describe all items below, you can decide how to order and prioritize items most relevant to your study, findings, context, and readership whilst keeping your writing concise. You can read how STARD was developed in the FAQs.

Item name What to write
 Title or abstract
1. Identification as a study of diagnostic accuracy Identification as a study of diagnostic accuracy using at least one measure of accuracy (such as sensitivity, specificity, predictive values, or AUC).
 Abstract
2. Abstract Structured summary of study design, methods, results and conclusions (for specific guidance, see STARD for Abstracts).
 Introduction
3. Background Scientific and clinical background, including the intended use and clinical role of the index test.
4. Objectives Study objectives and hypotheses.
 Methods
5. Study design Whether data collection was planned before the index test and reference standard were performed (prospective study) or after (retrospective study).
 Participants
6. Eligibility criteria Eligibility criteria.
7. Identifying eligible participants On what basis potentially eligible participants were identified (such as symptoms, results from previous tests, inclusion in registry).
8. Setting, location, and dates Where and when potentially eligible participants were identified (setting, location and dates).
9. Consecutive, random or convenience series Whether participants formed a consecutive, random or convenience series.
 Test Methods
10a. Index test Index test, in sufficient detail to allow replication.
10b. Reference standard Reference standard, in sufficient detail to allow replication.
11. Reference standard rationale Rationale for choosing the reference standard (if alternatives exist).
12a. Index test cut-offs or categories Definition of and rationale for test positivity cut-offs or result categories of the index test, distinguishing prespecified from exploratory.
12b. Reference standard cut-offs or categories Definition of and rationale for test positivity cut-offs or result categories of the reference standard, distinguishing prespecified from exploratory.
13a. Information available to performers or readers of the index test Whether clinical information and reference standard results were available to the performers or readers of the index test.
13b. Information available to reference standard assessors Whether clinical information and index test results were available to the assessors of the reference standard.
 Analysis
14. Analysis methods Methods for estimating or comparing measures of diagnostic accuracy.
15. Indeterminate results How indeterminate index test or reference standard results were handled.
16. Missing data How missing data on the index test and reference standard were handled.
17. Variability Any analyses of variability in diagnostic accuracy, distinguishing prespecified from exploratory.
18. Intended sample size Intended sample size and how it was determined.
 Results
 Participants
19. Participant flow diagram Flow of participants, using a diagram.
20. Baseline characteristics Baseline demographic and clinical characteristics of participants.
21a. Participants with the target condition Distribution of severity of disease in those with the target condition.
21b. Participants without the target condition Distribution of alternative diagnoses in those without the target condition.
22. Time interval Time interval and any clinical interventions between index test and reference standard.
 Test Results
23. Index test and reference standard results Cross tabulation of the index test results (or their distribution) by the results of the reference standard.
24. Estimates of accuracy Estimates of diagnostic accuracy and their precision (such as 95% CIs).
25. Adverse events Any adverse events from performing the index test or the reference standard.
 Discussion
26. Limitations Study limitations, including sources of potential bias, statistical uncertainty and generalisability.
27. Implications for Practice Implications for practice, including the intended use and clinical role of the index test.
 Other information
28. Registration Registration number and name of registry.
29. Protocol Where the full study protocol can be accessed.
30. Funding Sources of funding and other support; role of funders.

We like publishing transparent research because we think it’s more likely to be used and cited. That’s why we ask authors to use reporting guidelines.

Robin Lavery

Editor, International Journal of World Medicine

Ready to get started?

Medical test

Any method for collecting additional information about the current or future health status of a patient.

Index test

The test under evaluation.

Target condition

The disease or condition that the index test is expected to detect.

Clinical reference standard

The best available method for establishing the presence or absence of the target condition; a gold standard would be an error-free reference standard.

Sensitivity

Proportion of those with the target condition who test positive with the index test.

Specificity

Proportion of those without the target condition who test negative with the index test.

Intended use of the test

Whether the index test is used for diagnosis, screening, staging, monitoring, surveillance, prediction, prognosis, or other reasons.

Role of the test

The position of the index test relative to other tests for the same condition (for example, triage, replacement, add-on, new test).

Indeterminate results

Results that are neither positive or negative.

Citation

For attribution, please cite this work as:
Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD reporting guideline for writing diagnostic accuracy study articles. The EQUATOR Network guideline dissemination platform. doi:10.1234/equator/1010101

Reporting Guidelines are recommendations to help describe your work clearly

Your research will be used by people from different disciplines and backgrounds for decades to come. Reporting guidelines list the information you should describe so that everyone can understand, replicate, and synthesise your work.

Reporting guidelines do not prescribe how research should be designed or conducted. Rather, they help authors transparently describe what they did, why they did it, and what they found.

Reporting guidelines make writing research easier, and transparent research leads to better patient outcomes.

Easier writing

Following guidance makes writing easier and quicker.

Smoother publishing

Many journals require completed reporting checklists at submission.

Maximum impact

From nobel prizes to null results, articles have more impact when everyone can use them.

Who reads research?

You work will be read by different people, for different reasons, around the world, and for decades to come. Reporting guidelines help you consider all of your potential audiences. For example, your research may be read by researchers from different fields, by clinicians, patients, evidence synthesisers, peer reviewers, or editors. Your readers will need information to understand, to replicate, apply, appraise, synthesise, and use your work.

Cohort studies

A cohort study is an observational study in which a group of people with a particular exposure (e.g. a putative risk factor or protective factor) and a group of people without this exposure are followed over time. The outcomes of the people in the exposed group are compared to the outcomes of the people in the unexposed group to see if the exposure is associated with particular outcomes (e.g. getting cancer or length of life).

Source.

Case-control studies

A case-control study is a research method used in healthcare to investigate potential risk factors for a specific disease. It involves comparing individuals who have been diagnosed with the disease (cases) to those who have not (controls). By analysing the differences between the two groups, researchers can identify factors that may contribute to the development of the disease.

An example would be when researchers conducted a case-control study examining whether exposure to diesel exhaust particles increases the risk of respiratory disease in underground miners. Cases included miners diagnosed with respiratory disease, while controls were miners without respiratory disease. Participants' past occupational exposures to diesel exhaust particles were evaluated to compare exposure rates between cases and controls.

Source.

Cross-sectional studies

A cross-sectional study (also sometimes called a "cross-sectional survey") serves as an observational tool, where researchers capture data from a cohort of participants at a singular point. This approach provides a 'snapshot'— a brief glimpse into the characteristics or outcomes prevalent within a designated population at that precise point in time. The primary aim here is not to track changes or developments over an extended period but to assess and quantify the current situation regarding specific variables or conditions. Such a methodology is instrumental in identifying patterns or correlations among various factors within the population, providing a basis for further, more detailed investigation.

Source

Systematic reviews

A systematic review is a comprehensive approach designed to identify, evaluate, and synthesise all available evidence relevant to a specific research question. In essence, it collects all possible studies related to a given topic and design, and reviews and analyses their results.

The process involves a highly sensitive search strategy to ensure that as much pertinent information as possible is gathered. Once collected, this evidence is often critically appraised to assess its quality and relevance, ensuring that conclusions drawn are based on robust data. Systematic reviews often involve defining inclusion and exclusion criteria, which help to focus the analysis on the most relevant studies, ultimately synthesising the findings into a coherent narrative or statistical synthesis. Some systematic reviews will include a meta-analysis.

Source

Systematic review protocols

TODO

Meta analyses of Observational Studies

TODO

Randomised Trials

A randomised controlled trial (RCT) is a trial in which participants are randomly assigned to one of two or more groups: the experimental group or groups receive the intervention or interventions being tested; the comparison group (control group) receive usual care or no treatment or a placebo. The groups are then followed up to see if there are any differences between the results. This helps in assessing the effectiveness of the intervention.

Source

Randomised Trial Protocols

TODO

Qualitative research

Research that aims to gather and analyse non-numerical (descriptive) data in order to gain an understanding of individuals' social reality, including understanding their attitudes, beliefs, and motivation. This type of research typically involves in-depth interviews, focus groups, or field observations in order to collect data that is rich in detail and context. Qualitative research is often used to explore complex phenomena or to gain insight into people's experiences and perspectives on a particular topic. It is particularly useful when researchers want to understand the meaning that people attach to their experiences or when they want to uncover the underlying reasons for people's behavior. Qualitative methods include ethnography, grounded theory, discourse analysis, and interpretative phenomenological analysis.

Source

Case Reports

TODO

Diagnostic Test Accuracy Studies

Diagnostic accuracy studies focus on estimating the ability of the test(s) to correctly identify subjects with a predefined target condition, or the condition of interest (sensitivity) as well as to clearly identify those without the condition (specificity).

Prediction Models

Prediction model research is used to test the accurarcy of a model or test in estimating an outcome value or risk. Most models estimate the probability of the presence of a particular health condition (diagnostic) or whether a particular outcome will occur in the future (prognostic). Prediction models are used to support clinical decision making, such as whether to refer patients for further testing, monitor disease deterioration or treatment effects, or initiate treatment or lifestyle changes. Examples of well known prediction models include EuroSCORE II for cardiac surgery, the Gail model for breast cancer, the Framingham risk score for cardiovascular disease, IMPACT for traumatic brain injury, and FRAX for osteoporotic and hip fractures.

Source

Animal Research

TODO

Quality Improvement in Healthcare

Quality improvement research is about finding out how to improve and make changes in the most effective way. It is about systematically and rigourously exploring "what works" to improve quality in healthcare and the best ways to measure and disseminate this to ensure positive change. Most quality improvement effectiveness research is conducted in hospital settings, is focused on multiple quality improvement interventions, and uses process measures as outcomes. There is a great deal of variation in the research designs used to examine quality improvement effectiveness.

Source

Economic Evaluations in Healthcare

TODO

Meta Analyses

A meta-analysis is a statistical technique that amalgamates data from multiple studies to yield a single estimate of the effect size. This approach enhances precision and offers a more comprehensive understanding by integrating quantitative findings. Central to a meta-analysis is the evaluation of heterogeneity, which examines variations in study outcomes to ensure that differences in populations, interventions, or methodologies do not skew results. Techniques such as meta-regression or subgroup analysis are frequently employed to explore how various factors might influence the outcomes. This method is particularly effective when aiming to quantify the effect size, odds ratio, or risk ratio, providing a clearer numerical estimate that can significantly inform clinical or policy decisions.

How Meta-analyses and Systematic Reviews Work Together

Systematic reviews and meta-analyses function together, each complementing the other to provide a more robust understanding of research evidence. A systematic review meticulously gathers and evaluates all pertinent studies, establishing a solid foundation of qualitative and quantitative data. Within this framework, if the collected data exhibit sufficient homogeneity, a meta-analysis can be performed. This statistical synthesis allows for the integration of quantitative results from individual studies, producing a unified estimate of effect size. Techniques such as meta-regression or subgroup analysis may further refine these findings, elucidating how different variables impact the overall outcome. By combining these methodologies, researchers can achieve both a comprehensive narrative synthesis and a precise quantitative measure, enhancing the reliability and applicability of their conclusions. This integrated approach ensures that the findings are not only well-rounded but also statistically robust, providing greater confidence in the evidence base.

Why Don't All Systematic Reviews Use a Meta-Analysis?

Systematic reviews do not always have meta-analyses, due to variations in the data. For a meta-analysis to be viable, the data from different studies must be sufficiently similar, or homogeneous, in terms of design, population, and interventions. When the data shows significant heterogeneity, meaning there are considerable differences among the studies, combining them could lead to skewed or misleading conclusions. Furthermore, the quality of the included studies is critical; if the studies are of low methodological quality, merging their results could obscure true effects rather than explain them.

Protocol

A plan or set of steps that defines how something will be done. Before carrying out a research study, for example, the research protocol sets out what question is to be answered and how information will be collected and analysed.

Source

Medical test

Any method for collecting additional information about the current or future health status of a patient.

Index test

The test under evaluation.

Target condition

The disease or condition that the index test is expected to detect.

Clinical reference standard

The best available method for establishing the presence or absence of the target condition; a gold standard would be an error-free reference standard.

Sensitivity

Proportion of those with the target condition who test positive with the index test.

Specificity

Proportion of those without the target condition who test negative with the index test.

Intended use of the test

Whether the index test is used for diagnosis, screening, staging, monitoring, surveillance, prediction, prognosis, or other reasons.

Role of the test

The position of the index test relative to other tests for the same condition (for example, triage, replacement, add-on, new test).

Indeterminate results

Results that are neither positive or negative.