10 Refining the intervention: interviewing authors to identify deficient intervention components

10.1 Introduction

Having defined intervention components and built a prototype (chapter 9) I wanted to refine the home page and SRQR guidance page (which I refer to as “the website” in this chapter) by getting feedback from authors.

The MRC guidance on complex interventions, the Person Based Approach, and the Behaviour Change Wheel all stress the importance of including service users in the design of complex interventions [1–3]. Service users can help identify deficiencies which, if addressed, would make the intervention more successful [1]. Involving service users can also help researchers better understand influences, and whether intervention components are functioning as intended.

In this study, I wanted to address limitations I had identified my thematic synthesis (chapter 3), where I found testing reporting guidelines with users is rarely done, and often suffers from thin description and unrepresentative participants lacking diversity. In contrast, I wanted to obtain rich data from a diverse group of authors. In chapter 9 I positioned SRQR as a good guideline to test with users because its wide scope includes qualitative methods used by many researchers, from many fields, with varying levels of experience. Thus SRQR offered potential to recruit a diverse sample.

I did not aspire to perfect the website as I, like many in software developers, did not believe an optimal design exists. Variation between users, evolving trends, changing technological and contextual landscape means that a one-size-fits-all optimized design is unlikely [4]. However, although perfection does not exist, designs can none-the-less be improved through feedback and many designers use an iterative process. The purpose of this study was to collect evidence to inform future iterations. Instead of asking authors to suggest improvements (I suspect this strategy would have led to superficial suggestions or blank faces), I was more interested in identifying deficiencies. I defined a deficiency as any website element that, if modified, could better facilitate the website’s intended target behaviour (for researchers to use reporting guidance as early as possible in their research pipeline). I used the term deficiency over and above barrier or facilitator because it includes both (if a facilitator could be improved, it is deficient). In the future, EQUATOR and I will be able to identify modifications to address these deficiencies.

10.2 Methods

The purpose of this study was to identify deficiencies in a redesigned reporting guideline and EQUATOR Network home page (“the website”).

My objectives were to:

explore the experience of a diverse sample of authors, and
to identify and understand deficiencies.

Sampling strategy

My purposive sample of authors varied in their:

years of academic experience,
subject area,
their first language, and
their country of residence.

This variation was important because my thematic synthesis (chapter 3) suggested inexperienced authors may have the most to benefit from reporting guidelines, but also face the most hurdles. Inexperience may be due to early career stage, being new to a field or study design, or new to academic writing. My synthesis also suggested language barriers could hinder adherence, and my service evaluation of EQUATOR’s existing website (chapter 5) revealed a highly international user base.

Authors were eligible to participate if they were currently engaged in research utilizing qualitative methods, and if they were able to attend an online interview conducted in English.

I recruited through four channels:

I posted on X (called Twitter at the time).
I advertised through Penelope.ai [5], the manuscript checker I created before starting my PhD. Many medical journals offer the manuscript checker to submitting authors. BMJ Open is the largest of these journals, and enjoys a large, international, author base.
I invited researchers from a research consultancy in the Philippines.
I wrote to Chinese researchers who had published qualitative research, and I asked them to share my recruitment advert. One of the researchers I contacted posted the advert on internet forums used by Chinese students.

My recruitment advert is in appendix R. The advert said I was “looking to speak with people performing qualitative research about a new website”. It did not specify what the website was about, who it was for, or whether it would help with their job, because I wanted authors to be naïve when first viewing the website. This mimics the real world, where authors might be sent to EQUATOR’s website by a journal with only minimal information on what to expect from it.

All recruitment channels invited authors to signal their interest by email. To check applicants’ eligibility as qualitative researchers, I asked them to describe their research methods in a few sentences over email. I excluded applicants if their descriptions made no reference of a qualitative method.

I sent all eligible applicants the participant information sheet and consent form. I used JISC Online Surveys [6] to obtain consent and ask the following demographic questions:

How many years have you done research?
Please describe your research in a couple of sentences
What is your first language?
What country do you work and live in?

I offered participants $50 as reimbursement in return for an expected 2 hour commitment. This was delivered as an Amazon voucher to UK participants, and a bank transfer to international participants. My email templates and information sheet are in Appendices S and T.

Time and money limited my target sample size to 10 participants. As argued by Nielsen and Launder [7], small samples (fewer than 10) are often sufficient to identify the majority of deficiencies. In chapter 8 I introduced information power [8] as a concept to guide sample size in qualitative research, and I drew upon it again in this study. I maximised information power firstly by using methods to elicit rich information from each participant. Secondly, I used my table of intervention components (9) as an analysis framework. Hence I anticipated a sample of 10 to sufficiently inform at least one design iteration at the end of data collection.

Procedures

I wanted my study to resemble the way authors will experience the website in real life. Firstly, interview sessions took place online using Microsoft Teams. This meant participants could view the website on their own computer, using their normal browser, in their usual place of work. This would allow me to identify problems like slow-loading over bad internet connections or display problems on different screen resolutions, whilst avoiding complications stemming from asking a participant to use an unfamiliar computer or browser.

Secondly, I wanted to replicate the experience of encountering a new website as a naïve user, gradually exploring content, and then reading and applying guidance to one’s own writing. I did this by using a variety of methods:

5 second test to capture initial reactions [9]
Think aloud to capture real-time exploration within the interview session [10]
Plus-minus task to capture exploration between interview sessions [11]
A writing evaluation to explore interpretation [12]
Semi structured interviews throughout. [13]

I have outlined the order of data collection methods in Table 10.1.

Table 10.1: Data collection methods and when they occurred.

STAGE	METHOD
Session 1	Five second test Semi-structured interview 1 - prior experience of reporting guidelines Think aloud 1 - the home page Semi-structured interview 2 - the home page Think aloud 2 - the top of the SRQR page Semi-structured interview 3 - the SRQR page
At home, between sessions	Plus-minus task Writing using SRQR
Session 2	Interview covering the plus/minus annotations Interview covering the writing sample Semi-structured interview 4 - closing thoughts

My interview schedule (appendix U) included the verbal instructions for each task and topic guides for each semi-structured interview. I tested the interview schedule by doing a mock interview with a student at Oxford University.

I began sessions by introducing myself as part of a team creating a new website. To encourage open and truthful feedback I reassured participants the best way they could help was by being honest, not to worry about critiquing the website or offending me, and to share positive and negative feedback. I asked participants to tell me a little about themselves to relax into the interview and help them feel comfortable talking, before moving on to the first task: the 5 second test.

Five second test

Until this point, participants had no idea what the website was about. My recruitment materials and interview introduction made no mention of writing nor reporting guidelines, and so participants were unaware of the website’s purpose.

The five second test is an “in the moment” survey method [9]. By sharing my screen, I showed participants the top of the home page for five seconds before removing it and asking questions. The test limits exposure to five seconds because although a participant can absorb much information (colours, words, shapes), five seconds is rarely sufficient to make sense of everything as a whole. The aim is to capture participant’s immediate reactions to salient design elements (like images, large words) before they have a chance to consider the content more critically. Furthermore, this five second limit was relevant because my website service evaluation found many authors leave EQUATOR’s website (chapter 5) within five seconds without interacting with it.

This test was appropriate for the top of the home page because, as per best practice, the area has little text, all relevant content is visible in one frame, and I asked few questions:

What do you think the website is about?
How do you think this website may affect your work?

If participants answered the first question with “reporting guidelines”, I asked “what do you think reporting guidelines are?”. If participant’s answers did not mention writing, I asked what stages of research they might use the website.

I designed these questions to explore three intervention ingredients: describe what reporting guidelines are, how they can be best used, and their benefits. These are the main ingredients featured at the top of the home page.

Semi structured interview 1 - prior experience with reporting guidelines

After the five second test it was no longer necessary to keep participants masked to the website’s purpose, and so I asked participants about their prior awareness of, or experience with, reporting guidelines. I asked which guidelines they had used and what they had used them for.

Think aloud 1 - home screen

Website designers often ask participant’s to “think aloud” as they complete a task or view a website as a way of exploring participants’ thought processes [14,15]. Think aloud as a method was first described by cognitive psychologists Ericsson and Simon [16]. Their strict approach viewed verbalizations as “indicators of what information was heeded and in what order, a sort of time stamp of the contents of short-term memory” [10]. As user experience testers adopted the method, they used it more flexibly to additionally capture participants’ thoughts, feelings, and expectations [10]. Whereas cognitive psychologists use the method to understand cognitive processes, usability testers use it to “support the development of usable systems by identifying system deficiencies”. Because “building robust models of human cognition is not a central concern”, Ericsson and Simon’s strict approach is less appropriate, and usability testers use a more flexible and pragmatic approach to data collection and interpretation [10].

As per best practice [10] I began by explaining the task, and giving instruction to continually verbalize a train of thought. I then demonstrated by sharing my screen, opening up a different website, and “thinking aloud” for a minute. Participants then shared their screen as they explored the home page. Whenever participants stopped talking, I would prompt them to continue by asking “What are you thinking?”. I acknowledged participants’ verbalizations with neutral sounds like “uh-hu” and “mmm”, which encourage further talk, but do not show agreement or disagreement. These verbalizations and prompts are also considered best practice [10].

Semi-structured interview 2 - home page

Once participants had explored the entire home page, I asked participants about any intervention components they had not talked about in the think aloud, about their overall opinions, and whether their understanding had changed since first viewing it.

Think aloud 2 - SRQR guideline page

I asked participants to find the relevant guideline for reporting qualitative research, and then to continue thinking aloud as they explored it. The top of the SRQR page included information about the guideline, such as its scope, and the number of journals endorsing it. Participants thought aloud as they continued down the page. Because the SRQR guidance is long, I stopped participants from thinking aloud once they reached the first reporting item.

Semi-structured interview 3 - SRQR guideline page

I then used semi-structured interview questions to explore any intervention components in the introduction missed by the think aloud. Moving down to the reporting guideline itself, I asked questions to explore participants’ expectations of four key features within in guidance: defined words (signified by a dotted underlines), footnotes (signified by a superscript number), links to discussion boards (signified by an icon), and drop down content (signified by a chevron icon). I pointed to an example of each and asked participants what they expected to happen if they clicked on it.

This marked the end of the first interview session. I then explained the plus-minus and writing tasks participants needed to complete before the second session.

Plus-Minus task:

The full SRQR guideline is too long to cover in a single interview session, and I wanted to capture participant’s experience of reading and applying the guidance in a realistic context, so I looked for methods that would allow me to collect data whilst participants read and applied reporting items in their own time, as part of their normal work.

In their review of methods to solicit text evaluations from readers, de Jong and Schellens [11] distinguish between evaluation goals: selection (whether readers will engage with the text), comprehension, application (being able to apply information in a real world setting), acceptance (including credibility), appreciation, relevance, and completeness. I was not interested in selection (participants had no option other than to engage with the text), and my study scope did not extend to SRQR’s relevance nor completeness. I was interested in comprehension, acceptance, appreciation, and application.

de Jong and Schellens describe methods to target comprehension, acceptance, appreciation in isolation, but because my interest included all three, I chose a nonspecific method that could explore them all; the Plus-Minus task. In this task, readers are asked to annotate a document with plus and minus signs to signify positive and negative reading experiences and then discuss annotations retrospectively.

I asked participants to select and annotate 2 or 3 reporting items relevant to whatever they happened to be writing up in the time between interviews. I created duplicates of the SRQR guidance page and gave participants unique URLs so they did not see each other’s annotations. I used a web annotation tool called Hypothes.is [17]. Participants could optionally add comments alongside their plus and minus signs. Participants explained their annotations in the second interview.

As de Jong and Schellens note [11], the plus-minus method is advantageous over other nonspecific methods (like thinking aloud whilst reading) because it collects data without disturbing participants’ natural reading process. Additionally, it was useful in this study as participants could make annotations in their own time, as part of their normal work pattern, and then discuss them retrospectively in the second interview.

Writing Evaluation

Although the plus-minus task will detect text that participants consider incomprehensible, it cannot detect whether participants comprehend guidance correctly or whether they are able to apply it to their writing. To address this, I used a writing evaluation. I asked participants to use the reporting items they labelled in the plus minus task when writing up their own research in between interview sessions, and to send me what they wrote before the second interview. I read the excerpts and noted reporting items (and sub items) as present or missing.

In the second interview, inspired by Davies et al.’s SQUIRE guidelines evaluation [12], I asked participants to identify parts of their writing pertaining to reporting items. When I considered an item (or sub item) to be missing, I asked the participant whether they had reported this information. If they felt they had, I asked them to point out where, and then explored any misinterpretations. If they had not reported information, I asked why.

Semi-structured interview 4 - closing thoughts

To end the second interview session I asked participants to describe their experience of using the reporting guideline, and to share any final thoughts.

Methods explored different intervention components.

Each method targeted multiple intervention components. For example, in the 5 second test, participants could only see the top of the home page. The text, images, and design in this section are there to communicate what reporting guidelines are, when they can be used, and that they will benefit authors. These functions come from three intervention components defined in chapter 9:

Describe what reporting guidelines are where they are first encountered,
Clarify what tasks (e.g., writing, designing, or appraising research) guidelines and resources are designed for, and
Describe personal benefits and benefits to others where reporting guidelines are introduced (home page, on resources, in communications).

Components that required participants to read text could be best explored in the plus minus-task, and I hoped the writing evaluation would reveal how participants interpreted and applied instruction. I hoped the think aloud would capture opinions on salient features, and the semi structured interviews would allow me to explore remaining, un-noticed, features. In table Table 10.2 I detail the intervention components I expected each method to explore. The intervention components are defined in chapter 9, where I also list the website elements related to each component.

Table 10.2: Methods used and the intervention components they explore. Intervention components are defined in chapter 9

METHOD	INTERVENTION COMPONENTS (defined in previous chapter)
5 Second Test	Describe what reporting guidelines are where they are first encountered, Clarify what tasks (e.g., writing, designing, or appraising research) guidelines and resources are designed for, Describe personal benefits and benefits to others where reporting guidelines are introduced (home page, on resources, in communications), Include design, features, and language to foster trust
Think Aloud	Clarify what tasks (e.g., writing, designing, or appraising research) guidelines and resources are designed for, Instruct authors to cite reporting guidelines so readers may learn about them, Links between related guidelines, Centralised hosting, Search function on website, Describe the scope of a reporting guideline at the top of every resource, Use if-then rules to direct authors to more appropriate and up-to-date guidance when available, Explicitly state when no better guidance exists for a use case, Provide translations, Make guidance appear shorter by removing superfluous information, hiding optional content, splitting long guidelines, using concise language, and separating design advice, Cater to different kinds of user (readers vs dippers) by structuring guidance with headings, itemisation, hyperlinking to particular sections, and with optional content, Include testimonials from researchers who were nervous about being punished for reporting transparently, Remove branding and messaging that may invoke feelings of judgement, complexity, or administrative red-tape, Reassure that all research has limitations to encourage explanation over perfect design, Educate authors about writing as a process, Link all resources to each other, Gather and communicate evidence for benefits, Include design, features, and language to foster trust, Create spaces for authors to discuss reporting guidelines with others, Use tone of voice and design to communicate personal benefits; confidence and simplicity, Include testimonials from research users who benefit from complete reporting, Explain importance of complete reporting to the scientific community, Provide links to other resources that explain how an item can be done, Structure guideline items to make them quicker to digest, Tell authors how long the guidance will take to read, Provide advice regarding how to respond if asked to remove reporting guideline content by a colleague, editor, or reviewer, Reassure when guidelines are just guidelines, Explain how the guidance was developed and why it can be trusted
Interview	Describe what reporting guidelines are where they are first encountered, Clarify what tasks (e.g., writing, designing, or appraising research) guidelines and resources are designed for, Instruct authors to cite reporting guidelines so readers may learn about them, Describe the scope of a reporting guideline at the top of every resource, Include testimonials from researchers who were nervous about being punished for reporting transparently, Address communications to authors, Describe personal benefits and benefits to others where reporting guidelines are introduced (home page, on resources, in communications), Create spaces for authors to discuss reporting guidelines with others, Use tone of voice and design to communicate personal benefits; confidence and simplicity, Include testimonials from research users who benefit from complete reporting, Define key terms, For each item, explain why the information is important and to whom (not just what constitutes “good” design), For each item, provide clear instruction of what needs to be described, For each item, provide examples of reporting in different contexts, Structure guideline items to make them quicker to digest, Tell authors when to use reporting guidelines, or that reporting guidelines are best used as early as possible, Provide instruction as to how and where information can be reported without breaching word count limits or making articles bloated., Explain when reporting guidelines do not intended to prescribe structure, Provide advice regarding how to respond if asked to remove reporting guideline content by a colleague, editor, or reviewer, Encourage explanation even when choices are unusual or not optimal, Avoid patronizing language, Explain how the guidance was developed and why it can be trusted
+/- test	Decrease fear of judgement by making reporting guidelines design agnostic, Use plain language, Define key terms, Provide links to other resources that explain how an item can be done, For each item, provide clear instruction of what needs to be described, For each item, provide examples of reporting in different contexts, Structure guideline items to make them quicker to digest
Writing Evaluation	Use plain language, For each item, provide clear instruction of what needs to be described
Not Explored	Search Engine Optimization, Use consistent terms, Provide clear instruction of what needs to be described when an item was not done, could not be done, or does not apply, Present design advice separately to reporting advice

I did not attempt to explore two intervention components. I did not expect participants to know about or comment on search engine optimization, especially as a large amount of optimization occurs in the website meta data and is thus invisible. Secondly, because I did not want to edit the meaning of the SRQR guidelines (just its layout), I did not want to add instructions about what to report when an item was not done, could not be done, or does not apply.

Data processing and analysis

I recorded video and audio transcriptions using Microsoft Teams. Because automatic audio transcription was not always accurate, I corrected them by rewatching the videos. I de-identified transcripts by replacing names with participant codes before importing them to NVivo for coding [18].

As with my focus group data in chapter 8 I used qualitative description to aggregate and summarise ideas [19,20]. I used my intervention component table (see chapter 9 and appendix P) as a framework to code transcripts line by line. I did this deductively; whenever a participant said anything about a component, I coded text to that component. Because some website features implemented multiple components (for example, an image can both educate and persuade), I sometimes coded text to multiple components. In this way, I created categories of codes, and each category was an intervention component. I gave codes equal weight across all methods (5 second test, think aloud, interviews e.t.c.), and mapped data from all methods onto the same framework.

Once all transcripts were coded, I grouped codes within categories into deficiencies. If a single component was deficient in multiple ways, I created a code group for each deficiency. If there was disagreement about a deficiency (e.g., some people disliked a component, but others liked it), then I created sub-groups within each deficiency. Although positive feedback did not directly address my objective of identifying deficiencies, I kept these codes because they provided context and counter-evidence to deficiencies.

Some participants spontaneously suggested modifications. In these instances, I coded the proposed modification and the underlying deficiency. Because some participants spontaneously shared prior experiences using reporting guidelines I coded these using my list of influences from step 4 of chapter 7 as a framework. I decided to create new codes for any influences not previously identified.

In this way, I ended up with a list of deficiencies (my primary unit of analysis), and incidental lists of influences and possible modifications.

Reflexivity and Trust

As with my focus group chapter 8 I tried to remain objective during interviews, and I considered research paradigm to be post-positivist: my role was to identify deficiencies from what participants said, but I acknowledged that my own experience, perspective and opinion may affect what I observed, understood, and concluded.

As with previous chapters, I used a number of techniques to ensure credibility, transferability, dependability, and confirmability [21]. I describe these in Table 10.3.

Table 10.3: Techniques used in this study for establishing trustworthiness. Based on Lincoln and Guba’s Evaluative Criteria [21]

TECHNIQUE	IMPLEMENTATION
Techniques for establishing transferability
Thick description	I aspired to report my results with context by indicating when ideas were common or rare, and who they originated from when I felt this was particularly relevant. I reported disagreements, provide quotes, and relationships between ideas. Interview sessions were long and used multiple techniques to elicit lots of data. I reported findings alongside relevant context, including participant demographics, and used context to reconcile disagreements.
Techniques for establishing confirmability
Audit trail	I referred to video recordings when I needed to clarify parts of the transcript. I kept all raw data, and a record of modifications made.
Reflexivity	I kept a diary during data collection to note of ideas and my own feelings during. Because I created the website being tested, I felt it was important to reflect on any feedback that made me feel defensive, frustrated, or that I did not understand. In my previous experience, these moments of conflict are important as they often hint at a latent misunderstanding or deficiency. Receiving negative feedback does not always hurt me. For example, sometimes I expect negative feedback because it reflects a limitation or trade-off that I already know about. Other times negative feedback can feel like an “aha” moment, as I discover a problem I immediately understand, agree with, and can see a solution to. In contrast, when feedback feels bad, in my experience that is because I have misunderstood something, and my internal model of the situation is off. Although I found it easy to remain professional and neutral within the interviews, having a “thick skin” is not enough. In these moments of friction I made sure to delve deeper into the issue with the same participant or future ones.
Techniques for establishing credibility
Negative case analysis	I purposefully explored negative feedback that I found unexpected or challenging (see reflexivity).
Peer debriefing	CA acted as a disinterested peer throughout design, data collection, analysis and reporting. She questioned my reasoning she helped me become aware of biases, potential flaws, and assumptions I was making.

Ethics

Oxford University’s Medical Sciences Interdivisional Research Ethics Committee deemed this study to be a service evaluation, and so judged ethical approval unnecessary.

Reporting

I used SRQR [22] when outlining this chapter, and again to check my reporting during revision (see appendix Q).

10.3 Results

Recruitment

I recruited participants between 21/03/2023 until 9/08/2023. The number of people who expressed interest, were eligible, consented, and participated are shown in Table 10.4. Eleven people participated. Two dropped out before the second interview, without giving a reason. Participants’ characteristics are summarized in Table 10.5 and included variety in research experience (from 1 to 10+ years), subject area, country of origin, and first language. Six participants had never heard of reporting guidelines before – one had, but did not remember which. Three others had used a reporting guideline before, and one had used many reporting guidelines before. The first interview lasted between 45 minutes - 1.5 hours, and the second interview lasted 30-45 minutes.

Table 10.4: Recruitment and drop out of participants through different channels

CHANNEL	No. PEOPLE CONTACTED	EXPRESSED INTEREST	ELIGIBLE	INVITED TO CONSENT	CONSENTED	COMPLETED INTERVIEW 1	COMPLETED INTERVIEW 2
Penelope.ai	Unknown	144 (23 were excluded because they did not want to attend an online interview conducted in English. A further 78 did not describe using qualitative methods when asked to describe their research).	43	43 (30 did not reply)	13 (3 could not find a time for interview because of work commitments, 2 did not reply)	8 (1 lost to follow up)	7
X	Unknown	1	1	1 (1 did not reply)	0	0	0
Email invitation	Unknown (2 emails sent, but were forwarded to an unknown number of recipients)	4 (1 did not reply)	3	3	3	3 (1 lost to follow up)	2
Total	Unknown	149	47	47	16	11	9

Table 10.5: Participant characteristics

ID	JOB TITLE	SUBJECT AREA	RESEARCH EXPERIENCE (YEARS)	FIRST LANGUAGE	COUNTRY OF ORIGIN	PREVIOUS EXPERIENCE WITH REPORTING GUIDELINES
1	Research consultancy	Health policy	4	English	Philippines	None
2	Medical student	General qualitative medical research	1	English	Ghana	Had used PRISMA
3	Academic researcher	Clinical psychology and public health	7	Spanish	Ecuador	Had used COREQ
4	Academic researcher	Physiotherapy	10+	English	UK	Could not remember
5	Academic researcher	Medical ethics	7	English	India	None
6	Midwifery student	Sexual and reproductive health	3	Lango	Uganda	None
7	Academic researcher	Environmental Health	10+	English	South Africa	Had used JARS
8	Academic researcher	Physiotherapy	10+	English	Australia	Had used many reporting guidelines before
9	Pre PhD student	Public health	1	Chichewa	Malawi	None
10	PhD student	Child development	7	Chinese	China	None
11	PhD student	Child development	4	Chinese	China	None

Design Iterations

I had originally planned to finish data collection before making any changes to the website. However, the first five participants consistently mentioned similar deficiencies. After reflecting and discussing with UK EQUATOR staff, we agreed these deficiencies would likely affect many authors and diminish the website’s success, and so we decided to iterate our design.

Briefly, the changes we made included:

Editing the text at the top of the home page for clarity and to emphasise benefits. I edited it from “Writing research, made simple. Write confidently using guidelines created by the research community” to “Want help writing up research? Reporting guidelines help you describe research quickly, confidently, and completely.”
Adding images and colour to make the home page more attractive and to convey meaning of the accompanying text (see Figure 10.1).
Adding publisher logos to foster trust
Reorganising the introduction to SRQR to make it appear shorter.

The results below include quotes and discussion pertaining to these changes, along with all other deficiencies.

Main findings

I identified 53 deficiencies. Appendix V lists the deficiencies identified for each intervention component with supporting quotes. I have chosen to describe a few deficiencies in more detail, either because I deemed them important, they were frequently mentioned (and, therefore, salient), or involved multiple parts of the website. I then provide a summary of the remaining deficiencies.

Deficient component: Describe what reporting guidelines are where they are first encountered

Relevant website features: Prominent definition on home page and guideline page.

Influences addressed: Researchers may not know what reporting guidelines are

Because it is important for website visitors to quickly realise the site contains resources for writing up research articles, as opposed to designing or appraising studies, I used the 5 second test to explore what participants understood the website to be about upon first impression. This was the first time participants saw the website. Before then, they had no idea what it would be about.

In the first iteration the heading was “Research articles, made simple”, but some participants thought this was about reading or explaining research articles as opposed to writing them. In the second iteration, we changed this heading to “Want help writing up research?”.

On immediate impression, some participants quickly realised the website was about writing, but some did not, or thought it was about methodological guidelines

All participants realised the website was about research. Some researchers realised the website was about writing within 5 seconds:

“From what I have seen, I think probably the website should be about, uh, helping you try to discover….identify the guidelines that you will use for writing your study quickly.” (Midwifery student from Uganda)

“how to go about writing something” (ECR from India)

However, other participants gleaned only vague understandings like “support for doing research” (PhD student from China) or “guidelines of some sorts, I think in relation to research” (Pre-PhD student from Malawi), and one expected the website to be about methodological guidance:

“methodology guidelines one can use or follow when conducting research” (Researcher from the UK)

Participants with previous experience using reporting checklists realised the website might be about reporting guidelines:

“Well, I hadn’t seen the guide that I’m familiar with (that is the COREQ). But I think the other guides also are like COREQ. So that’s what comes to my mind. So I think it’s a [website] where you are going to find all the checklists or the guides for all the […] final stages of the research when we are, like, writing the paper, just to […] double check that everything has been included.” (ECR from Ecuador)

“That’s the EQUATOR guidelines, so that’s… consensus… expert consensus-developed guidelines for the reporting of different research.” (Researcher from Australia)

“[I read that] you can use reporting guidelines to try and help you [write research articles] more efficiently or quickly. And then I was thinking about, what the heck are reporting guidelines? And then I think it might be stuff like STROBE or like those checklists things or PRISMA, if you’re doing a systematic review or something. And that’s all I got” (Researcher from South Africa)

Given a few more seconds to explore the website on their own (during the think aloud task), all participants realised the guidelines were for writing and others gained more insight into what to expect from reporting guidelines.

“after reading this sentence I think you want to give me a framework, a framework about writing. Is that right?”

“I think the guideline would, uh, would clearly state the different sections of the research report or the manuscript and then whatever is required under a section like maybe under methods. Like what are the nitty gritties required under method.” (Midwifery student from Uganda)

Some participants found later, longer descriptions more informative. Referring to content half way down the landing page, one participant said:

“Why couldn’t this be further up? Why can’t that be at the top and then the that stuff here follow? Because then I’d have a better idea of what this is about.[…] I would have liked to have read this at the top and I would have known straight away what this whole website was about.” (Researcher from South Africa)

Deficient component: Include design, features, and language to foster trust

Relevant website features: Professional design. EQUATOR’s Logo remains prominent. Citation metrics are presented at the top the reporting guidance. Information about who developed the guidelines, how they developed it, and why the guidance is credible is still provided, and easily findable from the top of the guidance.

Influences addressed: Researchers may not believe stated benefits

When participants talked about trust, they mentioned whether the website came across as professional, credible, and believable. This intervention component is complex because content and design throughout the entire website influenced judgements regarding trust. In particular, participants wanted to know who made the website and why they could be trusted, and identified some design elements that could look more professional.

EQUATOR’s introduction could be more prominent

Apart from its logo, EQUATOR was not mentioned at the top of the page. Participants who already knew about EQUATOR said that its brand lent credibility:

“and then I picked up the top my top left hand corner with the EQUATOR logo so it seemed from reputable source.” (Researcher from the UK)

“I already trust the website because I saw that… like this is legit and I see the credentials from EQUATOR network” (ECR from Ecuador)

Participants unfamiliar with EQUATOR expressed wanting to know who developed the website. Whilst looking at the top of the home page, one participant said:

“I really don’t get an idea of […] who’s responsible for the website […] would I trust the developers of the website?” (Pre-PhD student from Malawi)

The home page introduced EQUATOR at the very bottom. Participants recommended moving this introduction (or parts of it) up to the top, using an updated photo, and adding EQUATOR’s affiliations and awards.

“So most places put the about stuff at the bottom and I would have liked to have seen what [EQUATOR stands for] explained right at the top.” (Researcher from South Africa)

“And then this is the thing on the bottom that I want to look at on every website… because I want to see if they have an actual office. So usually I click this”about us” first” (Research consultant from the Philippines)

The site’s design could be more professional

I tried to create an aesthetic that would make the website appear simple. The first iteration was apparently too simple and one participant explained how its simplicity made it less trustworthy:

“it looks kind of like a blog […] it’s a basic website” (Research consultant from the Philippines)

“I wouldn’t say [it looks] particularly trustworthy, but not particularly suspicious either. Kind of in the middle […] something that will be more trustworthy will be something which is more sophisticated because I know, ‘OK, this is someone who actually took his time to do a lot of work… put a lot of work in designing it’. Most of the time if it is a fake website, it’s usually much more simple.” (Medical student from Ghana)

One participant viewed both the first and second iterations of the home page (their second interview session occurred after the iteration), and they described the second iteration as better because “It’s like more trustable […] Scientific. Evidence based. Yeah, of course, legit.” (ECR from Ecuador)

However, one participant still questioned the second iteration’s simplicity and trustworthiness, and drew a comparison with another website that she did trust:

“So I’m saying it’s kind of basic, that the format itself is kind of basic […] [When] I’m looking for information on PUB Med, just the outlet itself gives you the picture that, you know, somehow you can trust it. You know it looks as if there was more work put in it.” (Pre-PhD student from Malawi)

Logos lend credibility and could be more prominent

Participants noticed that the first iteration had no logos:

“I don’t know if this is just me, but I kinda want some logos. So I know who will vouch for [the website] right away. Like, usually […] there’s some, like, other medical societies that are, like,”We we are on the EQUATOR network” (Research consultant from the Philippines)

I added logos to the second iteration’s home page to show publishers endorsing reporting guidelines. All participants liked these, but some suggested they could appear at the top of the home page so they are immediately visible.

“Leading publishers….Wow, this is good…Nature. Really? Elsevier, BMJ. Yes, this is good. And this brings some sense of trust and authenticity in the website.” (Midwifery student from Uganda)

“[The publishers’ logos are] encouraging, because these are all publishing houses with mostly reputable journals, probably all reputable journals. […] You know, if these were higher up, then […] that would have made me feel a little bit more like ohh, this is good.” (Researcher from South Africa)

Numbers showing reporting guideline endorsements and citations lend credibility, but may not be intuitive

The top of the SRQR guideline page included widgets displaying the number of journal endorsements, and the number of times the reporting guideline had been cited. Some people commented that this information lent credibility:

“I think it is authentic. It’s robust. If it was endorsed by many journals and developed by experienced researchers, if I use it, maybe I’ll get a better quality work.” (Midwifery student from Uganda)

“I think citations here might be some people or some people’s work who has cited this page. So this this button […] might show people how many other words use this page.” (PhD student from China)”

“…understand […] that the SRQR guidelines is something that’s already widely used.”

“I didn’t pay attention before, but I think I like it (the citation information) […] if it is more cited, I think, like, I will believe it. I will believe it, like, much better, and also like the journal endorsements” (PhD student from China)

However, not everybody understood what these numbers meant.

“Is this [widget] telling me [something], or is it what I am supposed to click on?” (Pre-PhD student from Malawi)

“I was a bit confused there, OK” (Pre-PhD student from Malawi)

“I think the citation tab over here… What is the relevance of it? I mean, why I’m seeing that?” (ECR from India)

Others were not sure whether the citation information pertained to the website or an underlying article.

“I’m not sure if [it is about], you know, the website or connected paper.” (PhD student from China)

“Has it been cited 4000 times? I don’t understand that.” (Researcher from South Africa)

Not everybody considered the image at the top of the home page to be trustworthy

For the second iteration, I added an image to the top of the home page comprising of three icons to represent the process of writing a manuscript. One participant described this image as a “bit naff […] I actually think this [image] reduces [the website’s] score on the first impressions of trustworthiness kind of thing. Just because [the icons making up the image] are so, umm, ubiquitous, and, uh cheap?” (Researcher from Australia)

Deficient component: Describe personal benefits and benefits to others where reporting guidelines are introduced (home page, on resources, in communications)

Relevant website features: Benefits are prominently and consistently displayed across the home page and guidance pages. Descriptions prioritize personal benefits to the authors above hypothetical benefits to others.

Influences addressed: Researchers may not know what benefits to expect

Benefits are clear, but could be communicated quicker

I wanted website visitors to immediately expect the website to benefit them as researchers and authors. The website headline is one of the first things visitors see, and so was an important feature for communicating benefits.

All participants talked about benefits or help. None talked about the opposite (e.g., rules, requirements, or red tape). In the 5 second test, participants generally talked about help in a general sense:

“I see research and writing. So i’m thinking. This is to help me with something with my job.” (Research consultant from the Philippines)

“so it’s going to assist me in research. It’s going to help me somehow. Make things easier for me.” (ECR from Ecuador)

Under the headline, I included a short statement: ‘reporting guidelines help you describe research quickly, confidently in completely’. Two participants did not find this brief text meaningful in the five second test:

“I tried to read the sub headline just below the biggest one and it says help you blah blah confidently and blah blah. […] So I think this information maybe be meaningless for me […] because it sounds like it didn’t provide some concrete information. It’s just a sentence that tried to cheer me up.” (PhD student from China)

“And then there’s something to do with ‘reporting guidelines help you describe research quickly, confidently in completely’. This information is not really telling me much” (Pre-PhD student from Malawi)

However, they seemed to understand the reported benefits after reading the top of the home page in more detail, and after viewing the section below where benefits are stated more clearly.

“OK, I think this part is very great because when I when I see something like ”easy writing”, ”smoother publishing” I think ”Ohh, that’s great. That’s what I want.”” (PhD student from China)

“Participant: Now I’m getting a sense of what the website is about. Now looking at the things down here…

Interviewer: OK.

Participant: I’m getting that it might be a useful resource. That actually, umm, because these are, I think, for… early career researchers like me, I’d say I’d be very interested to come in and see this.” (Pre-PhD student from Malawi)

Participants seemed to understand how reporting guidelines might make publishing “smoother”.

“it gives me an impression that maybe this website will help me write my work easily and it will also help me increase the chance of my work getting published […] Umm, just aligning myself to this standard that already many people use. And hopefully, In doing that, I’ll be up to standard and then I won’t stress myself too much later.” (Midwifery student from Uganda)

Making a distinction between benefits to authors and readers may lead to confusion about the intended user

Further down the home page, the section title ‘Helping authors and readers’ made one participant believe the website also hosts resources for readers.

“So this is a bit weird. So is the point here that this one is for the writers. And now it’s saying, OK, but we can also help readers. OK, I suppose that’s interesting” (Researcher from South Africa).

The images depicting benefits could be clearer

One participant said the icons describing writing and impact were appropriate (a blank page and an award, respectively), but the image depicting “smoother publishing” was not intuitive.

“looking at that icon, it doesn’t really tell me anything about smoother [publishing].” (Researcher from Australia)

Deficient component: Clarify what tasks (e.g., writing, designing, or appraising research) guidelines and resources are designed for

Relevant website features: Clear instruction and differentiation of resources

Influences addressed: Researchers may not know what reporting guidelines are; Researchers may not know when reporting guidelines should be used

Tools for drafting and checking were mostly intuitive, but could be more prominent.

Half way down the home page, a section described how to use templates and checklists to draft and check manuscripts. Participants seemed to find this intuitive and appealing:

“I like that: different stages and different tools” (ECR from Ecuador)

“Yeah, writing templates is something I’ve recently come across, and I think that might be useful. I’ve tried it a bit when writing abstracts. And I guess, yeah, that would be something I’d be interested in looking into further.” (Researcher from the UK)

However, describing these tools further up the page might help visitors “get” what RGs are about

“[it] would actually be very good to appear [higher up the page] because then it would now start opening up one’s understanding as to exactly where this kind of guidelines might be applied.” (Pre-PhD student from Malawi)

“If any of these things: writing research, checking manuscripts and planning research, if these can be consolidated on [the top of] your landing page somewhere […] it might be beneficial because my thought process is that I need to know what I’m doing and only then reporting guidelines can help me, right? So if I know that this website is gonna help me with writing the manuscripts, checking […] I think then, reporting guidelines can make a logical progression in that particular case?” (ECR from India)

Even though participants could not download templates or checklists (there were buttons, but clicking these did not trigger a download), they had expectations of what these resources might look like.

“The checklist could be a pre-populated document that I can go through..it may have a table that I could go through as a tick box exercise ticking which of the [guideline reporting items] my study includes.” (Researcher from the UK)

“[Regarding templates] I would like to adjust this template by myself. Just like a semi structured interview. I don’t want this template be a structured interview. I want it to be semi structured so I can have the space to adjust it.” (PhD student from China)

Using a reporting guideline “for planning” was not intuitive

The same section described how to use reporting guidelines when planning or conducting research. The SRQR page had a button to download a “log book” where researchers could document the decisions and data they would later need to report. However, in contrast to the checklist and template, no participants understood what this log book might be:

“it’s not immediately intuitive what a log book might be” (Midwifery student from Uganda)

“OK, what’s a log book? Don’t know.” (Researcher from South Africa)

“nothing has been mentioned about the log book overhead. How [am I] gonna use the log book? Maybe you have mentioned about the template checklist, but uh, maybe we can add [something about] the log book.” (ECR from India)

The word “planning” was not intuitive either. One thought this meant planning a manuscript, and so confused the purpose of the log book with that of the template. Another interpreted it as guidance for writing a research proposal.

“When I’m reading planning research, I think maybe I have already done this when I design my own outline.” (PhD student from China)

“To write can help you plan a study. Yeah, I guess it’s… I I would have thought this might be useful if you’re writing a research grant or a research proposal.” (Researcher from the UK)

I had ordered the tasks and tools as “drafting”, “checking” and then “planning”. I put planning at the end because it is the least conventional way to use a reporting guidelines. Some participants questioned this ordering:

“Why is planning at the end? You have to plan first before you write.” (Researcher from South Africa)

Deficient component: For each item, provide examples of reporting in different contexts

Relevant website features: SRQR already had some examples. No more examples added

Influences addressed: Researchers may not know how to report an item in practice

Many participants said they wanted more, varied examples

Each reporting item in SRQR comes with one or more examples from published literature. All participants stressed the usefulness of these examples. The attention examples received was notable because I did not ask about them; all comments about examples came spontaneously from the participant in the think aloud and plus minus tasks.

“Oh, and you have examples that could be really helpful, yeah.” (ECR from Ecuador)

“the most important thing I would say is the examples” (Midwifery student from Uganda)

“I found [this section] very useful as they give more detailed explanations on each specific section, and particularly the examples.” (Medical student from Ghana)

However, many participants said they wanted more examples, and greater variation in style, length, and conciseness.

“need further explanation and examples” (Midwifery student from Uganda)

“it would have been useful or helpful for me to have more than just one example.” (Pre-PhD student from Malawi)

“illustrations of how to [report] this information in a concise manner, that would be very helpful as well.” (Pre-PhD student from Malawi)

“you may as well put a whole discussion in there, or at least just sort of three or four paragraph discussion” (Researcher from South Africa)

“And here if there could be more examples, because when I […] started to read through and understand the the PRISMA guidelines and use the official explanation file to try to understand exactly what I’m required to write about and the examples particularly helped me a lot.” (Medical student from Ghana)

Many participants wanted examples from their own field, or even entire publications that have used the guideline:

“So I want something more relatable, [because] when I was reading these examples they were not relevant to my work.” (Researcher from South Africa)

“…if you can list some […] papers who use the SRQR, you can put it here.” (PhD student from China)

“I’ve searched on pubmed to find an example of a research article that’s used these standards so I could copy or check how they’ve laid it out, which subheadings they’ve used.” (Researcher from South Africa)

Examples may be less credible if they are old or not referenced

In the original SRQR publication, all examples are referenced. I had not included references for the examples when putting them onto the website because of time constraints. This bothered one participant.

“Well, there’s no references there, so is that a very good example? No, I don’t know the source of these things. […] You know, I don’t know, where did it come from? Where’s the reference?” (Researcher from South Africa)

When I asked about hypothetically labelling examples as illustrative if they were made up, the participant said “Yeah, I guess that would be alright”.

The same participant also noted that an example was quite old (10 years).

Examples could be more useful if explained or annotated

Because some reporting items contain multiple sub-items, one participant said annotating examples may be helpful. Taking a discussion item about transferability and integrating findings, they suggested “if you could underline or maybe indicate [in the example] that this [sentence] is now how they are trying to say the result can be transferable, umm, this is another [sentence] trying to say how they’re trying to integrate…. something like that.” (Midwifery student from Uganda)

Other findings

Many intervention components focussed on structuring guidance to make it appear short, navigable and digestible. All participants liked the structure, and said it made the guidance “more easy to follow” (ECR from Ecuador) compared to the original reporting guideline. Nobody disliked content hidden in collapsible content, but some felt the guidance still appeared very long. One suggested presenting items on separate web pages instead (Researcher from the UK) and another requested a summary (Research consultant from the Philippines). A few participants suggested making the item headings and side-navigation menu more prominent.

All participants realised that the website was aimed at researchers. This is perhaps not surprising as my study advert said I wanted to speak with qualitative researchers “about a new website”. Even though I did not specify who the intended audience was, it was implied. Indeed, some participants had assumed the website was aimed at qualitative researchers (I had specifically advertised for qualitative researchers because of our test guideline), suggesting their perception of target audience had been influenced by my recruitment materials. However, these same participants voiced that the website felt “more open” (Pre-PhD student from Malawi), like it was aimed at medical researchers more generally (which it is). Hence although my recruitment materials primed participants to expect the website to be aimed at qualitative researchers, that they correctly identified it to be for medical researchers in general suggests my intervention components were working as intended. This was further evidenced by participants who described the intended audience as “those who are just getting started in their career” (Pre-PhD student from Malawi) or “master research or higher degrees and also for some junior scholars” (PhD student from China)“.

In general, all participants were able to understand and use the interactive website elements. Only a few participants had difficulty with any features. For example, one thought the dotted lines representing a pop-up definitions were “misspelled words” (ECR from India)” (because that’s how Microsoft Word highlights errors). A few voiced confusion about the discussion board for each item. Another described feeling frustrated when clicking a footnote caused the page to scroll unexpectedly, and preferred when notes were placed within each item instead of at the bottom of the page. Nobody was surprised by the drop down expandable boxes or by the search button.

A few components required clarifying the relationships between the website, the EQUATOR Network, the guideline developers, and the original publications, and sometimes this clarification was unsuccessful. For example, a couple of participants were not immediately sure whether the guidance on the website was the same as the guidance within the publication. Some asked whether the citation and journal data (which were supposed to instil trust) were pertaining to the publication or the website, and which one they themselves should cite. When I asked participants where they might look for clarification, all referred to the FAQ and felt reassured after reading how the guidance was developed, but suggested this explanation be summarised and signposted earlier.

A couple of components involved adding quotes to the home page and guideline page. On the whole participants liked or felt neutral about these quotes. For example, one described how the quotes were

“practical from a different point of view. Like why exactly you need this [reporting item]. So now this person [in the quote] is telling from her own perspective how useful it is that you have [the item] described clearly, so it makes it such that if I’m trying to describe [this item], I’ll try to keep that in mind.” (Medical student from Ghana)

Another said

“I like it because each of them tells me why…umm…you know, kind of gives a plain language reason for […] why it’s a useful thing. […] That’s gives it, you know, humanity.” (Researcher from Australia)

Regarding quotes from academics who use reporting guidelines, they said

“That makes it relatable to a user, particularly a new user, because we can see that all of these people are, you know, they were first time users once.” (Researcher from Australia)

However, some participants questioned whether these quotes were from real people.

“Maybe they’re real…maybe it’s legit” (Research consultant from the Philippines)

A few others said they “don’t care what people think” (Researcher from South Africa) or “did not pay much attention to [the quotes]” (Pre-PhD student from Malawi).

Six intervention components received no mention. Two of these were purposefully not tested, three others were perhaps too subtle, and one was about removing aversive design, so it was good that no participants commented on the presence of ugly or judgemental design. Like all results discussed in this section, these unmentioned components are also in appendix V.

Influences

Participants naturally discussed influences they encountered when applying guidance, either during this study or in their previous experience. These influences were external to the website being tested, and beyond the scope of my intervention components and hence I did not code them as deficiencies. I had identified many of them in my previous work (chapters 3 - 5).

For example, in my thematic synthesis (chapter 3) I described how giving reporting advice at journal submission was a bad time because authors lack the time and motivation to change their writing. When making the website, I acknowledged that most authors would encounter it during journal submission. None of my intervention components seek to alter that initial encounter context directly (doing so would require changing our acquisition channels by, for example, getting more funders to link to our website). Consequently, I expect many authors will arrive at the website in the busy mindset of journal submission and wanting to get things done quickly. One participant articulated this concisely (Researcher from South Africa), when reflecting on their first interview session, when they felt “annoyed by stuff” because they were “working on something at the time” and so instead of “exploring” the website they were “just trying to get to where [they] wanted to be so that [they] could finish the work [they were] doing”. They wanted the reporting guidance as a short checklist and “didn’t want all the additional stuff”, referring to the longer reporting item explanations, the guideline introductory text, and the persuasive home page content.

However, after using the reporting guideline in their own time their opinion had totally changed by the second session: “when you specifically asked me to look at this and I used it [to write my] discussion, I was embracing it in a different way”. They found the guideline “really helpful” for writing, and then “enjoyed looking at all the different checklists and reporting guidelines”, ultimately deciding that “in a new journal that I’m a deputy editor [of], I’ve just said that in our in our scope and guidance for authors, we have to say that we require the use of reporting guidelines”. This change of heart came after a shift in context: whereas in the first session the participant was looking to get a job done quickly, by the second session, they had given the guideline time and used it in its intended way. The participant attributed this shift in context to being “specifically asked” to use the guideline for writing. Many components seek to achieve such a shift by convincing authors to come back and use the website earlier when writing up their next piece of research (see previous sub sections on describing what reporting guidelines are, when they are best used, and what they are best used for).

Another influence I identified in my thematic synthesis but did not address in this website was how to incorporate reporting guidelines into one’s writing process, or what to do when you don’t have a writing process at all. Although some components communicated that reporting guidelines should be used when drafting or writing manuscripts, none explained how (although one component involved directing authors towards training already delivered by EQUATOR). One participant eloquently described the challenge of adopting writing advice into their own practice.

“When I try to look at your guidance on your website, I really want to use it in my own writing, but it is very strange because when I try to, uh, connect the information on the website with my own writing, I found there there might be a great gap because I think everything on your website is very clear (actually they are very specific, those suggestions), but when I try to connect those information with my own writing, I found it just a little bit difficult to generate some specific ideas to start my writing.

So I’m thinking if that’s because the problem of my writing is not the lack of specific guidance but some other thing like my motivation or, I don’t know…it’s just a little bit strange.

And I also talk about this with my friends because lots of my friends, they are also PhD students and they are struggling at writing too. So ask them if they have some guidance, uh, if they have looked at some guidance and if [they] have put those guidelines in [their] own writing and their answers were, like, quite similar with me and they all talk about that, ‘yes, we look at lots of guidance we try to look at lots of those writing books to teach you how to write, to teach you how to structure your writing. But it’s still very hard’. When you really sit down and start writing, actually you couldn’t, uh, call up [the information].” (PhD student from China)

Participants’ echoed other influences I had identified in my thematic synthesis when reflecting on their prior writing experiences, including:

Not having known what reporting guidelines were earlier in their career
(Previously) finding the checklist, but not the full guidance
Being limited by journal requirements and word limits
Struggling to keep writing concise and fluid
Needing more guidance
Being unable to report an item because it is their colleague’s responsibility, or because they had not done what was being asked when designing their study or collecting data.
Paywalls
Not teaching students about reporting guidelines
Reporting guidelines not existing for funding applications
Funders not enforcing reporting guidelines

Participants also mentioned influences I had not identified previously. These included:

When guideline author names appear western, some (non-Western) participants expected the guidance to be less relevant to them.
The loading speed of websites (thankfully, this was not an issue for the website being tested)
Not understanding reviewer feedback
Not wanting to read on a screen
Not understanding the relationship between the EQUATOR Network and the guidelines or guideline developers.
Describing a guideline as “version 1.0” might make people feel like the guidance is (too) new, and therefore less trustable. This influence was new to my website as no existing reporting guidelines describe themselves as “1.0”.

Comparisons between the website being tested and the old EQUATOR website & guideline publications.

A few participants ended up exploring the original EQUATOR website and the original SRQR publication during their interviews. Participants instigated these unplanned explorations and comparisons for different reasons. One wanted to retrace their steps to show me the guideline they had used previously. Two others wanted to continue using reporting guidelines in the future and asked me where the original SRQR guidance could be found. Some others spontaneously reflected on their previous experience.

Recounting their experience of seeing the original EQUATOR website for the first time between interview sessions, one participant (ECR from Ecuador) said “Ohh no I didn’t like it. The [new] one is much, much better” because it looked more “trustworthy, more organised” and they preferred the font and colours. Another participant described the original website as “boring”, “outdated” and “text heavy” before recounting their experience of using it:

“Not that long ago I went on to the site because I was looking to complete a reporting checklist and it seemed clear to find the checklist that I wanted. But when I went through the checklist, it wasn’t appropriate. And then I just ended up feeling a bit unsure about what it is, which was the best one to go for.” (Researcher from the UK)

I witnessed another participant (Pre-PhD student from Malawi) experience similar confusion. They wanted to find the original SRQR guidance to continue using it after the study finished. Sharing their screen and thinking aloud, they started on the EQUATOR Network home page and tried to find the original SRQR guidance without my help. Although at first they thought EQUATOR’s home page looked “full” and “rich”, they were quickly “confused” by both EQUATOR’s website and the SRQR publication. After eight minutes and giving up three times, they eventually found the checklist but not the supplement containing the full guidance.

A second participant (Researcher from South Africa) achieved the same outcome a few minutes faster. Because examples only appear in the (not found) supplement they instead looked through “the reference list to see if there was potentially an example paper” and then planned to “go back to PubMed and search for an article that used these guidelines”. The participant appeared to have little interest in the article’s text, saying they did not “care what [the guideline developers] did to come up with it”.

Another participant (ECR from India) echoed this opinion when comparing the original SRQR publication with the redesigned version. They said they “don’t need” to know how SRQR was made when they are trying to use it, and they felt the redesigned guidance is “a bit more precise and to the point”. When I showed them the original SRQR full guidance (the supplement), they said:

“Participant: That’s too heavy on the content.

Interviewer: So if the option was between this this version that you’re looking at now [the original supplement] and the website that you saw first, which do you think you prefer to use?

Participant: I think the website is far better than the [supplement]. Yeah, this website is far better.”

Another participant reflected on their previous experience of using the PRISMA guidelines and explanation document. They said:

“I rather prefer this form of guidance [the website] than the other one [the publication]. There can be a lot more information presented in this way. […] That’s better because it’s more (how can I say?) well presented, well laid out, so that where I need to go deeper, I can go easily. Where I need just surface information or the parts that I’m already familiar with, I can just scroll through […] So I think I’ll prefer something presented in this way than the than the document that I read” (Medical student from Ghana).

10.4 Discussion

The purpose of this study was to identify deficiencies in a website for disseminating reporting guidance. I interviewed 11 researchers and used multiple qualitative methods to identify 53 deficiencies. Most intervention components on the website’s home page aim to communicate what reporting guidelines are, that they are best used early in writing, and how they will benefit the author. The results demonstrated most of these components to be somewhat successful, but not yet optimal. For example, some participants needed more than 5 seconds to realise the website was about resources to help them write. Participants often found later, longer content more useful than the short text at the top of the page. In seeking to balance brevity and clarity, perhaps I had been too mean with my word count. If “easier writing” is vague, “faster first drafts” might be concrete. If “complete reporting” isn’t intuitive, perhaps “writing up research fully so that everyone can understand, repeat, apply, and synthesise your work” is.

I had sought a similar balance between clarity and brevity when trying to organise the full SRQR guidance (35 pages in its original form) onto a single webpage, in a way that made it appear shorter and less intimidating. Again, the current design was somewhat successful. Participants liked the web features I had used to make the guidance more digestible, like expandable content, navigation menus, subheadings and consistent structure. However, some still felt the guidance looked too long, whilst others wanted to add content that would make it longer still; more examples, more information, more definitions, more signposts to other help. One solution may be to display reporting items on separate pages, as the ARRIVE developers have done on their website [23]. Another may be to display a summary of the guidance at the very beginning.

Many participants commented on the website’s design. Whereas I had been somewhat successful in projecting simplicity, for some participants, this crossed the line to basic-ness, especially in the first iteration. Many intervention components use design as a way to persuade and communicate with authors: I wanted pictures to depict tools, benefits, and purpose; layout and colours convey a feeling of ease and openness. Sadly neither I nor my colleagues possess expert design skills. Images took a long to create and, unlike text, are hard to iterate. This is a pity, as design often seemed more salient to participants than text, and bad design misled participants and put them off.

Design was also linked to another theme important to this study: credibility. For some participants, the website’s basic design eroded its trustworthiness. I mitigated this partially in the second iteration (e.g., by including logos), but future design iterations would ideally include professional design input.

Influences

Credibility rests on more than just design. Participants also wanted assurance that the guidance (text) could be trusted, which necessitated understanding the relationship between EQUATOR, guidelines developers, the original guideline publications, and the content of associated resources. Understanding this relationship was one of six new influences participants mentioned that may affect whether they successfully adhere to reporting guidelines. In chapters 3 and 4 I argued the need for more, in depth qualitative exploration of influences. Although I did not aim to solicit influences in this study, that I found novel influences incidentally suggests I have contributed towards filling that gap. Participants also mentioned eleven influences that I had previously identified in my earlier work. Therefore, this study adds credibility to my previous findings whilst also building upon them.

Strengths

Finding novel influences is testament to the strengths of this study. I recruited authors with diverse backgrounds and writing experience. My methods solicited rich information. My thorough analysis used my intervention component table as a framework to draw as much information as possible for the data. In contrast, many studies I reviewed in chapters 3 and 4 recruited homogenous samples, solicited thin description through surveys, and described their analysis techniques poorly. The few studies that elicited rich information focussed on content (e.g., PRISMA 2 [24]) or application (SQUIRE 2 [12]) of a reporting guideline but not the design or the website/publication hosting the guideline. By focussing on diverse recruitment, rich exploration of the guidance text and surrounding platform, and thorough analysis and reporting, I have strengthened my study and addressed limitations seen in others.

Limitations

However, other limitations remain. I will now discuss how 1) this study lacked contextual diversity and 2) not all intervention components were explored.

Context

My web audit (chapter 5) found only half of EQUATOR’s current visitors view the home page. Many arrive to the website directly on a reporting guideline page, often as a referral from a journal or a search engine. Because so many visitors never view the home page, many intervention components need to be placed on the home page and the reporting guideline page. For instance, naïve visitors should be able to tell what reporting guidelines are whether they arrive on the home page or directly on a guideline page. Some participants noticed this duplication and a few suggested removing or minimising it. However, because all participants viewed the home page first, this study did not capture experiences representative of website visitors that never see the home page. Therefore, future studies should explore the experiences of participants viewing the guideline page without seeing the home page.

Many authors discover reporting guidelines as they are submitting to a journal, whereas authors in this study were not. Because authors described manuscript submission as an inconvenient moment to intervene (see chapter 3), this may influence how authors experience the website. Once the website is live and journals are directing traffic to it, future studies can explore the experiences of authors using the website in contexts that are more true-to-life, as part of their journal submission journey. Similarly, if funders or ethics boards begin asking applicants to use reporting guidelines this context should be explored too.

Not all intervention components were explored equally:

Some intervention components received little to no discussion. The five second test, think aloud, plus minus test, and writing evaluation all examine salient intervention components and will not elicit discussion of un-noticed components.

Sometimes this was useful and expected. For example, one component was to remove patronizing language. That nobody spontaneously described the website as patronizing was a success. Similarly, another component was to use terms consistently. This component was only salient when it had not been applied properly, for instance, where I had used the terms “guidelines” and “reporting guidelines” interchangeably. For components like these, a good outcome is to go unmentioned. However, other components still deserve evaluation even though they are purposefully not salient. My semi structured interview questions addressed this limitation to some extent by asking participants directly about less salient features.

Some components could not be fully explored until the website is further developed. For example, although participants recognised the search button, they could not explore the search functionality. Although participants said they liked the links to related guidelines, I could not explore participants’ ability to find and select guidelines because the website only included SRQR. Once other guidelines are uploaded, future studies could use task based protocols [25] to explore how participants find, compare, and select appropriate guidelines.

Future studies

Whereas the limitations above affected my success in reaching my objectives (identifying deficiencies), my objectives were themselves limited and further work is needed to develop the website into a fully functional resource. I will now discuss potential future studies, including 1) prioritising deficiencies; 2) further iterations to address deficiencies; 3) extending the website with other guidelines, checklists, templates, examples, and resources; 4) evaluating components that could not be evaluated in this study; 5) comparing authors’ preference between the new and existing website and guidelines; 6) real world evaluations and 7) evaluating reporting guideline content.

Prioritising deficiencies

I made no attempt to prioritise deficiencies. Although some were more commonly raised than others, this was because of saliency and because of the methods I chose. For example, by choosing to use the 5 second test, I encouraged participants to focus on components featured at the top of the landing page. Similarly, my semi structured interview questions drew attention to particular components. Therefore, code frequencies should not dictate deficiencies’ priority and I purposefully have not reported them.

Instead, de Jong and Schellens [26] suggest ranking deficiencies according to their likelihood and severity. Likelihood refers to the number of users that may be affected by the deficiency, and severity means the degree to which the deficiency will block the desired behavioural outcome. I made no attempt to estimate these factors systematically in this study. Instead, I judged them instinctively when deciding what I could feasibly change in the first iteration.

More iterations are needed to fix deficiencies

Once prioritised, the remaining deficiencies need addressing and it is my intention, funding permitting, to design and evaluate new iterations after my DPhil. Testing future iterations with an identical protocol would offer continuity. It may be more prudent, however, to adjust the study protocol to target particular components or contexts.

Future evaluations are needed after extending the website

Future evaluations will also be required after the website is extended with more guidelines, search functionality, and with checklists, templates, and links to training and resources. Because different reporting guidelines cater to different research communities, and because these communities may have their own nuances and needs, future evaluations should recruit participants from these communities. For example, CARE may be more commonly used by clinical academics, and ARRIVE authors may come from the life sciences and medical sciences. One reason I chose SRQR was for its diverse user base. As the website grows and its audience expands, recruitment should diversify further.

Once checklists and templates are added, future evaluations should explore participants’ experiences of using these resources with and without prior exposure to the website. Just as some authors will bypass the home page and land directly on a guideline page (see Context section within limitations), some authors may receive a checklist or template directly from a colleague or journal without first visiting the website. Therefore, these resources should be evaluated in isolation and within the context of the website.

Evaluating components not explored in this study

Some components could not be explored in this study. Optimizing the website for search engines can only be assessed by an audit and by monitoring the website’s rankings using a tool like Google’s search console [27] once the website is live. Another component involved adding information to items to instruct authors what to do if a particular item was not, or could not be done. This item was more applicable to reporting guidelines for quantitative research, many of which make assumptions about design choices. SRQR is fairly agnostic to design choices, and I only added information to one item (item 5, regarding qualitative approach). In the writing evaluation, I asked participants to describe what part of their manuscript they were working on and I then recommended 2 or 3 reporting relevant reporting items. Item 5 was not relevant to any participants, and so no participants noticed nor commented on the component.

Comparing preferences

This study did not aim to explore whether participants preferred the revised reporting guideline and website over the existing ones. The few participants who made this comparison naturally all expressed preference for the redesign, but future studies could explore preferences in detail. Doing this qualitatively would reveal reasons behind preferences. A larger survey could confirm whether authors prefer one version above another.

Real world evaluations

Once the new website and redesigned reporting guidelines are live, real world evaluations should continue to monitor, understand, and improve authors’ experiences. This will include using google analytics to monitor how authors use the website, online surveys and other feedback channels, and opportunistic recruitment of authors engaged in their day-to-day work.

Some important metrics include the proportion of authors that return to use the website, the proportion who access resources for drafting vs checking (I would hope to see more authors use the former), and the length of time authors engage with guidance.

Because journals will probably continue to be an important dissemination channel, one possibility would be a mixed methods feasibility study, in collaboration with a journal. Such a study could combine google analytics data with author interviews and writing evaluations of manuscript submissions.

Evaluating guideline content

This study did not attempt to evaluate the SRQR recommendations themselves, but rather the guidelines’ presentation. I was interested in what participants thought of the structure, order, and layout of the guidelines, but not of its content. I was trying to look at the guideline on a macro level, and I was not interested in whether participants took issue with particular instructions.

I hope that guideline developers will begin evaluating their content in more detail. They could make use of de Jong and Schellens’ advice which, as I mentioned earlier, suggests a range of methods to explore the criteria needed for text to be effective [11].

Conclusions

This study aimed to identify deficiencies in a redesigned version of the SRQR guideline and EQUATOR Network home page. Intervention components were deficient if they could more successfully drive authors towards our target behaviour: to use reporting guidance as early as possible in their research pipeline. In identifying 53 deficiencies, I met my objective, but this success is a double-edged sword. This is the final research chapter of my thesis, and it would have been satisfying to conclude with “I’ve done it! The website is perfect!”, but the results of this study prove otherwise. Unfortunately, the realities of iterative design and limited funding force me to end with unfinished business. Nevertheless, I have suggested further studies to continue and extend the work presented here. In the next chapter, I will discuss my thesis as a whole, directions for future work, and implications for guideline developers and other meta-researchers interested in changing the scholarly system.

10.5 Reflections on this chapter

Writing the methods and results sections of this chapter were high and low points respectively. When writing the methods, I combined what I’d learnt from writing all previous chapters. As in chapter 3 I started by working through the reporting guideline item-by-item but this time I wrote bullet points instead of prose. I then drew on what I had learnt about structure and drafting when writing chapter 9 to reorganise these bullet points into a coherent linear narrative. I then topped and tailed paragraphs with topic and linking sentences, before sandwiching evidence and examples in between. I was happy with the result. The first draft scanned better than other chapters, and I felt confident I’d included guideline content.

The low came when writing my results section. I had intended to report results for each intervention component. If my thesis word count weren’t so limited this is what I would have done. Other departments have higher limits to better accommodate qualitative research with lengthy results. I found it impossible to condense my results further without losing details and nuances I wanted to retain, and so instead I decided to move them to an appendix and to construct a narrative around some highlights. But in selecting highlights I was giving saliency and importance to a subset of results. Some readers may interpret this as an additional subjective layer of analysis and may criticise me for picking these highlights, but the alternative would have been to reduce my results so much they lost their richness. Stuck between two bad options I chose the former. I have tried to be clear that selecting highlights was not another stage of analysis, and I refer readers to my full results in the appendix. Yet I feel like my word limit has forced me to add another filter to my results that I never wanted to apply.

When conducting these interviews there were moments I was aware of my emotional responses. Sometimes participants would validate my intentions, or confirm my hypotheses. For example, when one spoke about the challenges of incorporating writing guidance into their own practice, this resonated with my experience of writing my previous chapter. Other times participants criticised what I’d made or spoke against my expectations. I believe I dealt with these moments well, in part because of my experience in software development. When collecting feedback on something you’ve created, it is tempting to dwell on the positive and dismiss the negative. Over the years, I’ve learnt to pay most attention to the negative, the surprising, the friction felt when someone’s experiences are not what you expected. When I first started working in development these moments could feel like a personal attack, uncomfortable or disappointing, but I’ve learnt that they are often the most valuable. Nowadays I tend to lean into those moments and I felt myself doing that in these interviews. When participants said something unexpected or critical, I would ask more questions to try to fully understand their point of view.