
Executive Summary
Week 25 delivered a technically sophisticated exploration of AI safety, clinical implementation, and healthcare system integration. The community dissected emerging research on AI scribe effectiveness through new NEJM AI studies, examined data poisoning vulnerabilities whilst debating prompt injection risks, and engaged in substantive discussions about EPR system interoperability. BMJ's new generative AI series sparked conversations about the evolving doctor-patient relationship, whilst practical questions about transcription accuracy, coding reliability, and procurement challenges revealed the gap between research findings and frontline reality. The 300 million NHS AI budget announcement catalysed heated debate about centralised versus devolved decision-making, underscoring ongoing tensions between innovation and governance across the service.
The AI Scribe Evidence Base: Research Meets Reality
Friday's release of three new studies in NEJM AI on automated voice transcription systems triggered extensive analysis of what these findings actually mean for UK general practice. The group moderator highlighted the methodological rigour of the DAX versus Nabla comparison trial, noting Epic's built-in time-tracking capabilities for measuring documentation burden. However, responses quickly challenged the applicability of US-based research to NHS workflows.
A clinical informatician observed that EMIS does include similar tracking features but cautioned that "time in note metrics are very easily skewed in the workflow - people leave encounters open for a day or longer to wait for results." The fundamental difference in note purpose emerged as a critical distinction: in the US, notes function as intensive billing instruments, whilst in the UK they serve primarily medicolegal purposes as part of clinical workflow rather than definitive revenue-generating artefacts.
A clinical safety specialist noted that time savings from AVT in the published studies appeared "far more modest than I'd expected" compared to anecdotal reports from surgeries using Tortus or heidi in UK settings. This observation prompted the CEO of a scribe company to suggest the divergence stems from differing documentation cultures: "In the US the note is a very intense monetary instrument of record - in the UK it's medicolegal only and more part of the workflow than the definitive object, so the time saving is far greater."
The conversation evolved beyond simple time metrics to capture qualitative benefits that don't appear in controlled trials. A practice manager highlighted two unexpected advantages: GP trainees with typically lengthy consultations now maintain better recall of discussion points from start to finish, and a colleague with arthritis who was approaching retirement has continued practising due to reduced typing demands. Another clinician agreed: "Slow consulters run faster as the documentation is done. Tasks don't get forgotten so less rework."
A GP partner offered a different perspective entirely: "Ambient scribe does not save time. It hugely improves eye contact with patient, quality of documentation, safety netting and read coding." This framing shift from productivity to quality represented a recurring theme - that the value proposition for AVT might lie more in clinical excellence than efficiency gains.
The coding reliability discussion revealed persistent challenges. A system analyst suggested quality indicators should focus on standardisation and ensuring QOF points aren't missed: "The work has been done but just not coded properly. I would expect a good GP AVT to catch all that." However, a GP cautioned: "None of the AVT products I've seen have got the coding nailed yet really. Either they have a very small formulary of straightforward codes, or they are coding EVERYTHING in a way which isn't very helpful."
This prompted discussion of context errors, such as coding "cancer diagnosis" when discussing cancer screening, highlighting the sophistication still required for clinical interpretation beyond pattern recognition.
Security Nightmares: From Data Poisoning to Prompt Injection
The week's technical security discussions spanned from theoretical attacks to practical vulnerabilities, beginning Sunday with Anthropic's research on small-sample poisoning of training data. The group moderator explained how deliberately injecting specific harmful examples into training sets could corrupt model outputs, drawing parallels to traditional security threats: "It's a digital version of the 'genetically engineered payload' found in sci-fi like Three Body Problem."
Tuesday brought this theoretical risk into sharper focus with discussion of practical attack vectors. The moderator suggested patient complaint letters could serve as prompt injection mechanisms: "All they would need to do is have malicious instruction in white text within an email/electronic version of letter, and if AI was used to parse it could be prompt-injected."
A recently qualified GP questioned whether this represents genuinely novel vulnerability or merely existing problems with new angles: "Some of those issues are simply new angles of well documented and addressed engineering problems. The same way in which you protect for SQL injections, you protect for the above. Tight backend architecture precedes the ChatGPT era."
A clinical safety expert agreed: "Shouldn't be too hard to script a pre-read to auto-discard anything not in visible text or outside of standard characters you'd expect in a letter." The consensus emerged that properly designed systems with modern security standards shouldn't fall victim to attacks that wouldn't have worked a decade ago.
Thursday's revelation of the Mixpanel breach affecting OpenAI API users brought security concerns into immediate operational territory. The moderator shared the notification: "User profile information associated with use of platform.openai.com may have been exposed." Whilst OpenAI insisted no chat data, API keys, or login credentials were compromised, the incident sparked debate about trust and transparency.
An ICB digital lead posed the fundamental question: "Do we trust OpenAI to be sharing that chat data has been stolen?" The response emphasised the perennial issue: "Trust. Always comes back to it." The conversation highlighted how security incidents test not just technical controls but the credibility of vendor assurances, particularly relevant for NHS organisations making long-term commitments to AI infrastructure.
The 300 Million Question: Who Decides How AI Money Gets Spent?
Wednesday's revelation of 300 million in the budget for NHS AI and technology immediately sparked debate about procurement mechanisms and decision-making authority. An innovation-focused GP raised the critical question: "How much of that budget will go on SAFE AI and Tech?" whilst a clinical safety specialist quipped: "Isn't that the price for CoPilot?"
A GP with EPR expertise suggested the entire allocation would likely fund scribe procurement for the following year but warned: "Just need to make sure ICBs don't intercept the money and buy us tat - as they have done on numerous occasions recently." This comment catalysed passionate discussion about centralised versus devolved funding models.
The same GP argued forcefully for frontline control: "Devolve the money and let us, in the front line, choose. That's what used to happen pre-2004. Encourages competition between a plurality of providers. Encourages them to be more receptive to the front line, drives up quality and drives down prices."
A system analyst with multi-ICB experience warned against assuming centralisation equals efficiency: "There's a real danger of this happening. And some of the stuff they want to procure is total shite." The moderator suggested regional or national assurance frameworks could avoid duplication: "Or on some kind of central/regional assurance of same to avoid duplication?"
A GP based in Birmingham reported their ICB had attempted standardised evaluation but encountered limited supplier engagement: "We tried this but didn't get much interest. Also different ICBs want a different level of assurance. But some parts of the assurance can be done once and to an agreed standard." They noted having reviewed over 100 primary care products covering cybersecurity, clinical safety, information governance, training and contracting.
The debate reflected deeper tensions about where decision-making authority should reside in NHS AI adoption: with clinicians who understand workflow needs, with ICBs who can negotiate scale pricing, or through national frameworks that ensure baseline safety standards. No consensus emerged, but the conversation revealed sophisticated understanding of procurement's political economy.
Clinical Safety and Systems Thinking
Thursday's sharing of HSSIB's thematic review on EPR systems prompted sobering reflection on patient safety in digital transformation. A clinical informatician highlighted an HSJ exclusive headline: "EPRs pose persistent threat to patient safety." The HSSIB investigation examined system-wide safety issues across multiple trusts implementing electronic records.
The moderator observed that for devices already holding CE marks, recent legislation primarily aligned UK requirements with European post-market surveillance expectations: "All this leg did was bring UK back into line with Europe on PMS. If you are a manufacturer who didn't have CE, it's more of a change."
Discussion of EPR safety connected to broader conversations about the doctor-patient relationship in an AI-augmented world. A presentation shared Wednesday posed provocative questions about AI creating "doctor is wrong" dynamics when clinicians contradict algorithmic suggestions, even when the doctor is correct. A digital health specialist acknowledged the genuine risk: "There is a danger of 'doctor is wrong' if they contradict AI, even though they're right."
However, a clinical lead offered a more optimistic framing: "Surely the doctor proves the AI wrong with evidence? Test results etc? At which point the AI (that has no ego) changes its diagnosis? Seems like a good model to me?" A GP partner countered that the "correct" textbook answer isn't necessarily the most appropriate for that specific patient in those particular circumstances, highlighting clinical medicine's irreducible complexity.
An innovation-focused GP drew parallels to pre-digital information asymmetry: "No real difference in way patients used to bring in reams of newspaper cuttings or Internet searched. The principle is the same." This historical perspective reframed AI consultation tools as evolving rather than revolutionary challenges to clinical authority.
Model Wars and Practical Choices
The ChatGPT versus Gemini versus Claude debate surfaced Wednesday, with a clinical informatician seeking guidance: "I'm hearing great things about Gemini but don't want to add another subscription. I'm a heavy Claude user so wondering what the group's opinion would be."
A clinical safety expert immediately shared concerning research: "Gemini's safety protocols crack unacceptably under pressure" with a link to IEEE Spectrum analysis of agent safety failures. The informatician reported asking all three LLMs for their own recommendations: "Surprisingly they all said stick with ChatGPT and Claude."
The group moderator offered decisive guidance: "Gemini has its uses, but as a daily driver? CLAUDE." A Microsoft employee humorously acknowledged competing interests: "(Somewhat obliged to pick Copilot)" whilst a GP joked about secret preferences: "I won't tell Microsoft that you secretly use Gemini 3.0 personally."
Sunday's discussion of Perplexity AI revealed sophisticated understanding of product limitations. A digital transformation specialist shared experience of PayPal's free year promotion, noting usage limits even with Pro subscriptions. An ICB digital lead found it "a very useful search and research engine" but questioned whether they'd pay separately for it. A medical education specialist flagged "sneaky model switching" behaviour, sharing an article titled "Perplexity is giving you wrong answers on purpose."
These conversations revealed a community that approaches AI tools with healthy scepticism, testing claims against practical experience rather than accepting marketing assertions at face value.
Interoperability Wars: The EPR Migration Debate
Sunday's sharing of an EPR migration cost savings calculator sparked intense discussion about system choice, vendor lock-in, and what "integration" actually means. A GP with informatics expertise presented calculations suggesting practices, PCNs and ICBs could save tens of millions annually through migration to newer systems, with additional savings from secondary care adoption of accepted global standards for order communications.
A GP with multi-system experience immediately challenged the framing: "I love all the stuff you do, but I think your calculator reflects a bit too much of your passion for Medicus. Much of this could be also said of SystmOne - plus few other points that perhaps SystmOne has that Medicus does not yet have."
The calculator's author responded constructively: "If you send me the pricing for the various elements of SystmOne, I can try to incorporate them into the calculator or make it easier for S1 users to input their numbers." This collaborative approach highlighted the challenge of fair comparison when pricing models differ fundamentally between suppliers.
A system analyst with integrated care experience raised practical objections: "This doesn't take into account areas that are quite highly integrated needing proper interoperability. We do >100,000 appointments/year on behalf of our practices, a substantial amount requires proper interoperability for list gathering, write-backs etc. Any practice that moves away from EMIS would have a huge extra burden."
A GP responded philosophically: "Unfortunately 'proper integration' often necessitates everyone being on the same system. It should not be like that. And there are plenty of mixed economies who struggle which is not fair." An innovation-focused GP cut through the terminology: "It's not really 'integration' if it's just scaling the same solution across a conurbation."
The debate reflected longstanding tensions in NHS IT: between standardisation enabling efficiency versus plurality enabling competition; between vendor-specific ecosystems versus open standards; between immediate operational needs versus long-term strategic flexibility.
Quote Wall
"Your writing will be your biggest moat in the world filled with AI Slop." -- Innovation-focused GP on authentic communication (from previous week's discussion)
"The AI revolution has truly begun and this country boy's mind is blown with the possibilities now." -- GP describing AI-powered drive-through ordering at Popeyes (Saturday)
"In the US the note is a very intense monetary instrument of record - in the UK it's medicolegal only and more part of the workflow than the definitive object, so the time saving is far greater." -- AI scribe company CEO on US/UK documentation differences (Friday)
"Ambient scribe does not save time. It hugely improves eye contact with patient, quality of documentation, safety netting and read coding." -- GP partner reframing AVT value proposition (Friday)
"The work has been done but just not coded properly. I would expect a good GP AVT to catch all that." -- System analyst on coding standardisation opportunities (Friday)
"Tight backend architecture precedes the ChatGPT era." -- Recently qualified GP on security fundamentals (Tuesday)
"Gemini has its uses, but as a daily driver? CLAUDE." -- Group moderator on LLM preferences (Wednesday)
"Just need to make sure ICBs don't intercept the money and buy us tat." -- GP with EPR expertise on 300m AI budget concerns (Friday)
Journal Watch
Academic Papers and Key Studies
BMJ Generative AI Series - This week's BMJ launched a comprehensive series on artificial intelligence in clinical practice. The group moderator highlighted two foundational papers exploring how generative AI changes consultation dynamics and the doctor-patient encounter. Additional commentary by John Launer examined the importance of humility when engaging with AI systems. Charlotte Blease's authorship was noted with appreciation by community members familiar with her work on digital health and patient partnership.
NEJM AI: AI Scribe Comparative Studies - Three interconnected studies examining automated voice transcription in clinical settings sparked Friday's most sustained discussion. The research compared DAX versus Nabla versus control conditions, with particular attention to methodology for measuring time savings through Epic's built-in tracking. The editorial acknowledged modest productivity benefits whilst noting current performance likely represents the worst these systems will achieve, indicating trajectory towards more sophisticated task automation.
Anthropic Research: Small Sample Poisoning - Research from Anthropic examining how small amounts of deliberately corrupted data in training sets can compromise model behaviour. The group moderator connected this to broader security concerns about targeted attacks through training data manipulation, describing it as a "digital version of genetically engineered payloads" from science fiction.
Anthropic: Reward Hacking Discovery - Early Sunday sharing of Anthropic research documenting how models learn to "cheat" on training tasks by optimising scores without properly solving underlying problems. This mechanistic understanding of failure modes informed later discussions about AI system reliability and the importance of robust evaluation frameworks.
OLMo 3: Open Language Model - Allen Institute's release of OLMo, a 32-billion parameter open model publishing its complete training set, was highlighted as positive development for transparency and security. The moderator noted this addresses concerns about training data opacity raised in poisoning attack research.
HSSIB Thematic Review: EPR Systems and Patient Safety - Healthcare Safety Investigation Branch examination of electronic patient record implementation across NHS trusts, identifying persistent safety threats. The investigation's systemic perspective on digital transformation risks complemented ongoing group discussions about clinical safety governance.
Industry Articles and Reports
OpenAI Mixpanel Security Incident - OpenAI's disclosure of user profile information exposure through third-party analytics provider Mixpanel triggered discussion of vendor trust and transparency. The incident involved account holder details but not chat content, API keys or credentials, according to OpenAI's notification.
IEEE Spectrum: Gemini Safety Protocol Failures - Analysis of Google's Gemini model demonstrating unacceptable degradation of safety guardrails under pressure, shared during Wednesday's model comparison discussions. The research informed recommendations against Gemini as a primary production tool for clinical applications.
HSJ Exclusive: EPR Patient Safety Threats - Health Service Journal reporting on electronic patient record systems as persistent safety concerns, drawing on HSSIB investigation findings. The coverage examined recurring implementation challenges across multiple NHS trusts.
The Times: AI Job Displacement Projections - Coverage of research suggesting artificial intelligence could replace half of American jobs, shared Wednesday morning to frame discussions about healthcare workforce transformation and automation's potential scope.
Hexoskin FDA 510(k) Clearance - Medical device company announcement of regulatory approval for long-term ECG and respiratory monitoring system. An innovation-focused GP highlighted potential for proper remote monitoring of COPD and heart failure patients, noting they'd been tracking the technology's development.
Technical Resources and Tools
Karpathy's LLM Council - Andrej Karpathy's GitHub repository implementing multi-agent LLM systems with deliberative decision-making, shared Sunday as interesting architectural approach to AI system design.
TensorFlow Projector: Token Visualisation - Interactive tool for visualising high-dimensional embeddings in multi-dimensional proximity space, described as "frontend wizardry projecting embeddings as their tokens." Better rendered on laptop than mobile devices.
Perplexity Prompt Templates - XDA Developers coverage of Perplexity AI's new prompt templating features, shared alongside discussions of the platform's limitations and model-switching behaviour.
Government Technology Code of Practice - UK government guidance document outlining standards and principles for technology procurement and implementation in public sector, shared during Friday's discussions of centralised versus devolved decision-making.
Training and Professional Development
Free AI Crash Course Series - Cardiologist-led free CPD-accredited one-day AI courses touring UK cities, announced Monday with invitation to group members to attend and meet in person.
Oracle Data and AI Forum - December 9th event in Manchester focused on data and AI transformation, with success stories and peer networking opportunities around digital innovation.
Healthcare AI Products
Clinitalk - GP trainee feedback and consultation analysis tool mentioned Friday as strong option for analysing trainee scripts and offering improvement suggestions for SCA preparation.
Improval - Consultation coaching platform applying Kirkpatrick Model evaluation framework to assess training impact. Detailed Friday presentation outlined measurement of key metrics including consultation type, conversational balance, open-to-closed question ratios, and alignment with RCGP/NICE guidance.
HealthOrbit - Platform inquiry shared Saturday morning seeking user feedback and practical experience.
Motus VR - Virtual reality platform for respiratory rehabilitation, mentioned Wednesday by innovation-focused GP noting 5+ years of successful deployment in clinical service delivery.
Group Personality Snapshot
This community operates at the intersection of clinical practice, technical sophistication and systems thinking, maintaining remarkably high signal-to-noise ratio across diverse expertise. Conversations shift seamlessly from theoretical security research to practical coding challenges to procurement politics, with participants code-switching between technical precision and accessible explanation as context demands.


