8. Selection Process

Specify the methods used to decide whether a study met the inclusion criteria of the review, including how many reviewers screened each record and each report retrieved, whether they worked independently, and, if applicable, details of automation tools used in the process.

Essential elements for systematic reviews regardless of the selection processes used

  • Report how many reviewers screened each record (title/abstract) and each report retrieved, whether multiple reviewers worked independently (that is, were unaware of each other’s decisions) at each stage of screening or not (for example, records screened by one reviewer and exclusions verified by another), and any processes used to resolve disagreements between screeners (for example, referral to a third reviewer or by consensus).

  • Report any processes used to obtain or confirm relevant information from study investigators.

  • If abstracts or articles required translation into another language to determine their eligibility, report how these were translated (for example, by asking a native speaker or by using software programs).

Essential elements for systematic reviews using automation tools in the selection process

  • Report how automation tools were integrated within the overall study selection process; for example, whether records were excluded based solely on a machine assessment or whether machine assessments were used to double-check human decisions.

  • If an externally derived machine learning classifier was applied (such as Cochrane RCT Classifier), either to eliminate records or to replace a single screener, include a reference or URL to the version used. If the classifier was used to eliminate records before screening, report the number eliminated in the PRISMA flow diagram as “Records marked as ineligible by automation tools.”

  • If an internally derived machine learning classifier was used to assist with the screening process, identify the software/classifier and version, describe how it was used (such as to remove records or replace a single screener) and trained (if relevant), and what internal or external validation was done to understand the risk of missed studies or incorrect classifications. For example, authors might state that the classifier was trained on the set of records generated for the review in question (as may be the case when updating reviews) and specify which thresholds were applied to remove records.

  • If machine learning algorithms were used to prioritise screening (whereby unscreened records are continually re-ordered based on screening decisions), state the software used and provide details of any screening rules applied (for example, screening stopped altogether leaving some records to be excluded based on automated assessment alone, or screening switched from double to single screening once a pre-specified number or proportion of consecutive records was eliminated).

Essential elements for systematic reviews using crowdsourcing or previous “known” assessments in the selection process

  • If crowdsourcing was used to screen records, provide details of the platform used and specify how it was integrated within the overall study selection process.

  • If datasets of already-screened records were used to eliminate records retrieved by the search from further consideration, briefly describe the derivation of these datasets. For example, if prior work has already determined that a given record does not meet the eligibility criteria, it can be removed without manual checking. This is the case for Cochrane’s Screen4Me service, in which an increasingly large dataset of records that are known not to represent randomised trials can be used to eliminate any matching records from further consideration.

Explanation

Study selection is typically a multi-stage process in which potentially eligible studies are first identified from screening titles and abstracts, then assessed through full text review and, where necessary, contact with study investigators. Increasingly, a mix of screening approaches might be applied (such as automation to eliminate records before screening or prioritise records during screening). In addition to automation, authors increasingly have access to screening decisions that are made by people independent of the author team (such as crowdsourcing) (see box 3). Authors should describe in detail the process for deciding how records retrieved by the search were considered for inclusion in the review, to enable readers to assess the potential for errors in selection.12 34

Study selection methods

Several approaches to selecting studies exist. Here we comment on the advantages and disadvantages of each.

  1. Assessment of each record by one reviewer: Single screening is an efficient use of time and resources, but there is a higher risk of missing relevant studies12 3

  2. Assessment of records by more than one reviewer: Double screening can vary from duplicate checking of all records (by two or more reviewers independently) to a second reviewer checking a sample only (for example, a random sample of screened records, or all excluded records). This approach may be more reliable than single screening but at the expense of increased reviewer time, given the time needed to resolve discrepancies12 3

  3. Priority screening to focus early screening effort on most relevant records: Instead of screening records in year, title, author or random order, machine learning is used to identify relevant studies earlier in the screening process than would otherwise be the case. Priority screening is an iterative process in which the machine continually reassesses unscreened records for relevance. This approach can increase review efficiency by enabling the review team to start on subsequent steps of the review while less relevant records are still being screened. Both single and multiple reviewer assessments can be combined with priority screening45

  4. Priority screening with the automatic elimination of less relevant records: Once the most relevant records have been identified using priority screening, teams may choose to stop screening based on the assumption that the remaining records are unlikely to be relevant. However, there is a risk of erroneously excluding relevant studies because of uncertainty about when it is safe to stop screening; the balance between efficiency gains and risk tolerance will be review-specific45

  5. Machine learning classifiers: Machine learning classifiers are statistical models that use training data to rank records according to their relevance. They can be calibrated to achieve a given level of recall, thus enabling reviewers to implement screening rules, such as eliminating records or replacing double with single screening. Because the performance of classifiers is highly dependent on the data used to build them, classifiers should only be used to classify records for which they are designed56

  6. Previous “known” assessments: Screening decisions for records that have already been manually checked can be reused to exclude the same records from being reassessed, provided the eligibility criteria are the same. For example, groups that maintain registers of controlled trials to facilitate systematic reviews can avoid continually rescreening the same records by matching and then including/excluding those records from further consideration.

  7. Crowdsourcing: Crowdsourcing involves recruiting (usually via the internet) a large group of individuals to contribute to a task or project, such as screening records. If crowdsourcing is integrated with other study selection approaches, the specific platforms used should have well established and documented agreement algorithms, and data on crowd accuracy and reliability78

Example

“Three researchers (AP, HB-R, FG) independently reviewed titles and abstracts of the first 100 records and discussed inconsistencies until consensus was obtained. Then, in pairs, the researchers independently screened titles and abstracts of all articles retrieved. In case of disagreement, consensus on which articles to screen full-text was reached by discussion. If necessary, the third researcher was consulted to make the final decision. Next, two researchers (AP, HB-R) independently screened full-text articles for inclusion. Again, in case of disagreement, consensus was reached on inclusion or exclusion by discussion and if necessary, the third researcher (FG) was consulted.”9

For examples of systematic reviews using automation tools, crowdsourcing, or previous “known” assessments in the selection process, see supplementary table S1 on bmj.com

Training

The UK EQUATOR Centre runs training on how to write using reporting guidelines.

Discuss this item

Visit this items’ discussion page to ask questions and give feedback.

References

1.
Wang Z, Nayfeh T, Tetzlaff J, O’Blenis P, Murad MH. Error rates of human reviewers during abstract screening in systematic reviews. Bencharit S, ed. PLOS ONE. 2020;15(1):e0227742. doi:10.1371/journal.pone.0227742
2.
Waffenschmidt S, Knelangen M, Sieben W, Bühn S, Pieper D. Single screening versus conventional double screening for study selection in systematic reviews: A methodological systematic review. BMC Medical Research Methodology. 2019;19(1). doi:10.1186/s12874-019-0782-0
3.
Gartlehner G, Affengruber L, Titscher V, et al. Single-reviewer abstract screening missed 13 percent of relevant studies: A crowd-based, randomized controlled trial. Journal of Clinical Epidemiology. 2020;121:20-28. doi:10.1016/j.jclinepi.2020.01.005
4.
Shemilt I, Khan N, Park S, Thomas J. Use of cost-effectiveness analysis to compare the efficiency of study identification methods in systematic reviews. Systematic Reviews. 2016;5(1). doi:10.1186/s13643-016-0315-4
5.
O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: A systematic review of current approaches. Systematic Reviews. 2015;4(1). doi:10.1186/2046-4053-4-5
6.
Marshall IJ, Noel‐Storr A, Kuiper J, Thomas J, Wallace BC. Machine learning for identifying randomized controlled trials: An evaluation and practitioner’s guide. Research Synthesis Methods. 2018;9(4):602-614. doi:10.1002/jrsm.1287
7.
Noel‐Storr A. Working with a new kind of team: Harnessing the wisdom of the crowd in trial identification. EFSA Journal. 2019;17(Proceedings of the Third EFSA Scientific Conference: Science, Food and Society Guest Editors: Devos Y, Elliott KC and Hardy A). doi:10.2903/j.efsa.2019.e170715
8.
Nama N, Sampson M, Barrowman N, et al. Crowdsourcing the citation screening process for systematic reviews: Validation study. Journal of Medical Internet Research. 2019;21(4):e12953. doi:10.2196/12953
9.
Bomhof-Roordink H, Gärtner FR, Stiggelbout AM, Pieterse AH. Key components of shared decision making models: A systematic review. BMJ Open. 2019;9(12):e031763. doi:10.1136/bmjopen-2019-031763

Citation

For attribution, please cite this work as:
Page MJ, Moher D, Bossuyt PM, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 372:n160. doi:10.1136/bmj.n160

Reporting Guidelines are recommendations to help describe your work clearly

Your research will be used by people from different disciplines and backgrounds for decades to come. Reporting guidelines list the information you should describe so that everyone can understand, replicate, and synthesise your work.

Reporting guidelines do not prescribe how research should be designed or conducted. Rather, they help authors transparently describe what they did, why they did it, and what they found.

Reporting guidelines make writing research easier, and transparent research leads to better patient outcomes.

Easier writing

Following guidance makes writing easier and quicker.

Smoother publishing

Many journals require completed reporting checklists at submission.

Maximum impact

From nobel prizes to null results, articles have more impact when everyone can use them.

Who reads research?

You work will be read by different people, for different reasons, around the world, and for decades to come. Reporting guidelines help you consider all of your potential audiences. For example, your research may be read by researchers from different fields, by clinicians, patients, evidence synthesisers, peer reviewers, or editors. Your readers will need information to understand, to replicate, apply, appraise, synthesise, and use your work.

Cohort studies

A cohort study is an observational study in which a group of people with a particular exposure (e.g. a putative risk factor or protective factor) and a group of people without this exposure are followed over time. The outcomes of the people in the exposed group are compared to the outcomes of the people in the unexposed group to see if the exposure is associated with particular outcomes (e.g. getting cancer or length of life).

Source.

Case-control studies

A case-control study is a research method used in healthcare to investigate potential risk factors for a specific disease. It involves comparing individuals who have been diagnosed with the disease (cases) to those who have not (controls). By analysing the differences between the two groups, researchers can identify factors that may contribute to the development of the disease.

An example would be when researchers conducted a case-control study examining whether exposure to diesel exhaust particles increases the risk of respiratory disease in underground miners. Cases included miners diagnosed with respiratory disease, while controls were miners without respiratory disease. Participants' past occupational exposures to diesel exhaust particles were evaluated to compare exposure rates between cases and controls.

Source.

Cross-sectional studies

A cross-sectional study (also sometimes called a "cross-sectional survey") serves as an observational tool, where researchers capture data from a cohort of participants at a singular point. This approach provides a 'snapshot'— a brief glimpse into the characteristics or outcomes prevalent within a designated population at that precise point in time. The primary aim here is not to track changes or developments over an extended period but to assess and quantify the current situation regarding specific variables or conditions. Such a methodology is instrumental in identifying patterns or correlations among various factors within the population, providing a basis for further, more detailed investigation.

Source

Systematic reviews

A systematic review is a comprehensive approach designed to identify, evaluate, and synthesise all available evidence relevant to a specific research question. In essence, it collects all possible studies related to a given topic and design, and reviews and analyses their results.

The process involves a highly sensitive search strategy to ensure that as much pertinent information as possible is gathered. Once collected, this evidence is often critically appraised to assess its quality and relevance, ensuring that conclusions drawn are based on robust data. Systematic reviews often involve defining inclusion and exclusion criteria, which help to focus the analysis on the most relevant studies, ultimately synthesising the findings into a coherent narrative or statistical synthesis. Some systematic reviews will include a meta-analysis.

Source

Systematic review protocols

TODO

Meta analyses of Observational Studies

TODO

Randomised Trials

A randomised controlled trial (RCT) is a trial in which participants are randomly assigned to one of two or more groups: the experimental group or groups receive the intervention or interventions being tested; the comparison group (control group) receive usual care or no treatment or a placebo. The groups are then followed up to see if there are any differences between the results. This helps in assessing the effectiveness of the intervention.

Source

Randomised Trial Protocols

TODO

Qualitative research

Research that aims to gather and analyse non-numerical (descriptive) data in order to gain an understanding of individuals' social reality, including understanding their attitudes, beliefs, and motivation. This type of research typically involves in-depth interviews, focus groups, or field observations in order to collect data that is rich in detail and context. Qualitative research is often used to explore complex phenomena or to gain insight into people's experiences and perspectives on a particular topic. It is particularly useful when researchers want to understand the meaning that people attach to their experiences or when they want to uncover the underlying reasons for people's behavior. Qualitative methods include ethnography, grounded theory, discourse analysis, and interpretative phenomenological analysis.

Source

Case Reports

TODO

Diagnostic Test Accuracy Studies

Diagnostic accuracy studies focus on estimating the ability of the test(s) to correctly identify subjects with a predefined target condition, or the condition of interest (sensitivity) as well as to clearly identify those without the condition (specificity).

Prediction Models

Prediction model research is used to test the accurarcy of a model or test in estimating an outcome value or risk. Most models estimate the probability of the presence of a particular health condition (diagnostic) or whether a particular outcome will occur in the future (prognostic). Prediction models are used to support clinical decision making, such as whether to refer patients for further testing, monitor disease deterioration or treatment effects, or initiate treatment or lifestyle changes. Examples of well known prediction models include EuroSCORE II for cardiac surgery, the Gail model for breast cancer, the Framingham risk score for cardiovascular disease, IMPACT for traumatic brain injury, and FRAX for osteoporotic and hip fractures.

Source

Animal Research

TODO

Quality Improvement in Healthcare

Quality improvement research is about finding out how to improve and make changes in the most effective way. It is about systematically and rigourously exploring "what works" to improve quality in healthcare and the best ways to measure and disseminate this to ensure positive change. Most quality improvement effectiveness research is conducted in hospital settings, is focused on multiple quality improvement interventions, and uses process measures as outcomes. There is a great deal of variation in the research designs used to examine quality improvement effectiveness.

Source

Economic Evaluations in Healthcare

TODO

Meta Analyses

A meta-analysis is a statistical technique that amalgamates data from multiple studies to yield a single estimate of the effect size. This approach enhances precision and offers a more comprehensive understanding by integrating quantitative findings. Central to a meta-analysis is the evaluation of heterogeneity, which examines variations in study outcomes to ensure that differences in populations, interventions, or methodologies do not skew results. Techniques such as meta-regression or subgroup analysis are frequently employed to explore how various factors might influence the outcomes. This method is particularly effective when aiming to quantify the effect size, odds ratio, or risk ratio, providing a clearer numerical estimate that can significantly inform clinical or policy decisions.

How Meta-analyses and Systematic Reviews Work Together

Systematic reviews and meta-analyses function together, each complementing the other to provide a more robust understanding of research evidence. A systematic review meticulously gathers and evaluates all pertinent studies, establishing a solid foundation of qualitative and quantitative data. Within this framework, if the collected data exhibit sufficient homogeneity, a meta-analysis can be performed. This statistical synthesis allows for the integration of quantitative results from individual studies, producing a unified estimate of effect size. Techniques such as meta-regression or subgroup analysis may further refine these findings, elucidating how different variables impact the overall outcome. By combining these methodologies, researchers can achieve both a comprehensive narrative synthesis and a precise quantitative measure, enhancing the reliability and applicability of their conclusions. This integrated approach ensures that the findings are not only well-rounded but also statistically robust, providing greater confidence in the evidence base.

Why Don't All Systematic Reviews Use a Meta-Analysis?

Systematic reviews do not always have meta-analyses, due to variations in the data. For a meta-analysis to be viable, the data from different studies must be sufficiently similar, or homogeneous, in terms of design, population, and interventions. When the data shows significant heterogeneity, meaning there are considerable differences among the studies, combining them could lead to skewed or misleading conclusions. Furthermore, the quality of the included studies is critical; if the studies are of low methodological quality, merging their results could obscure true effects rather than explain them.

Protocol

A plan or set of steps that defines how something will be done. Before carrying out a research study, for example, the research protocol sets out what question is to be answered and how information will be collected and analysed.

Source

Systematic_review

A review that uses explicit, systematic methods to collate and synthesize findings of studies that address a clearly formulated question.

Source

Statistical synthesis

The combination of quantitative results of two or more studies. This encompasses meta-analysis of effect estimates (described below) and other methods, such as combining P values, calculating the range and distribution of observed effects, and vote counting based on the direction of effect (see McKenzie and Brennan for a description of each method)

Meta-analysis of effect estimates

A statistical technique used to synthesize results when study effect estimates and their variances are available, yielding a quantitative summary of results.

Source

Outcome

An event or measurement collected for participants in a study (such as quality of life, mortality).

Result

The combination of a point estimate (such as a mean difference, risk ratio or proportion) and a measure of its precision (such as a confidence/credible interval) for a particular outcome.

Reports

Documents (paper or electronic) supplying information about a particular study. A report could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information.

Record

The title or abstract (or both) of a report indexed in a database or website (such as a title or abstract for an article indexed in Medline). Records that refer to the same report (such as the same journal article) are “duplicates”; however, records that refer to reports that are merely similar (such as a similar abstract submitted to two different conferences) should be considered unique.

Study

An investigation, such as a clinical trial, that includes a defined group of participants and one or more interventions and outcomes. A “study” might have multiple reports. For example, reports could include the protocol, statistical analysis plan, baseline characteristics, results for the primary outcome, results for harms, results for secondary outcomes, and results for additional mediator and moderator analyses.