Healthcare technology
Startups Healthcare May 6, 2026 • 11 min read

Ship a Drug Interaction Alert With a Local LLM in 7 Steps

For: A CTO at an early-stage e-pharmacy or digital health startup who needs to add drug interaction alerts to a prescription workflow but cannot send patient medication data to a third-party API due to data residency or HIPAA concerns — and has never run an LLM entirely on-premise before

If you run an e-pharmacy and a regulator asks where patient medication lists are processed, "an OpenAI endpoint in us-east-1" is not the answer you want to give. But every drug interaction tutorial on the internet assumes you have an API key and a willingness to ship PHI offsite. This guide is the version that doesn't.

We'll build a drug interaction alert layer that runs entirely on your own hardware, using a local Mistral 7B model as a structured extraction layer over a drug interaction dataset you version and audit. The model never has to "know" an interaction it cannot cite back to a row in your database.

The design decision that makes this safe

Most LLM-for-medicine tutorials use the model as the source of truth: "Does warfarin interact with ibuprofen?" The model answers from its weights. That's malpractice waiting to happen. Training cutoffs are stale, hallucinations are silent, and there is no audit trail when a regulator asks why your system flagged (or missed) an interaction.

The fix: treat the LLM as an NLP component, not a knowledge base. It does two jobs:

  1. Extract normalized drug names from messy prescription text
  2. Generate a clinician-readable explanation only when given a retrieved interaction record

The interaction itself comes from a structured dataset (DrugBank, RxNorm + DDI sources, or a curated subset) you control and version. Every alert is traceable to a row. The model is just glue.

Prerequisites

This tutorial assumes you understand why running locally matters for HIPAA compliant LLM inference: the BAA surface area shrinks to your own infrastructure, and PHI never leaves your VPC.

Step 1: Install Ollama and pull Mistral 7B

Ollama is the simplest way to run Mistral medical NLP locally without wrestling with vLLM or llama.cpp build flags on day one.

curl -fsSL https://ollama.com/install.sh | sh
ollama pull mistral:7b-instruct-q4_K_M

Expected output:

pulling manifest
pulling 8934d96d3f08... 100%
verifying sha256 digest
writing manifest
success

The q4_K_M quantization runs in about 4.5GB of RAM with acceptable quality for extraction tasks. If you have 16GB+ VRAM, use mistral:7b-instruct-fp16 for better extraction accuracy on edge cases like compound drug names.

Verify it works:

ollama run mistral:7b-instruct-q4_K_M "Reply with just OK"

Step 2: Build your interaction dataset

For the tutorial, create a minimal CSV. In production, replace this with a normalized dump from your licensed source.

# interactions.csv
drug_a,drug_b,severity,mechanism,citation_id
warfarin,ibuprofen,major,"Increased bleeding risk via platelet inhibition",DDI-0001
warfarin,aspirin,major,"Additive anticoagulant effect",DDI-0002
simvastatin,clarithromycin,major,"CYP3A4 inhibition raises statin levels",DDI-0003
metformin,contrast_iodinated,moderate,"Lactic acidosis risk post-contrast",DDI-0004
sertraline,tramadol,major,"Serotonin syndrome risk",DDI-0005

Load it into Postgres with a generated reverse-pair index so lookups are commutative:

CREATE TABLE interactions (
  id SERIAL PRIMARY KEY,
  drug_a TEXT NOT NULL,
  drug_b TEXT NOT NULL,
  severity TEXT NOT NULL,
  mechanism TEXT NOT NULL,
  citation_id TEXT NOT NULL,
  dataset_version TEXT NOT NULL DEFAULT 'v2024.01'
);
CREATE INDEX idx_pair ON interactions (LEAST(drug_a, drug_b), GREATEST(drug_a, drug_b));

The dataset_version column is non-negotiable. When a clinician asks "why did the system not flag this last month?", you need to know exactly which version of the rules was active.

Step 3: Drug name extraction prompt

This is the LLM's first job. Given prescription text, return normalized drug names as JSON. No interpretation, no advice.

EXTRACTION_PROMPT = """You are a drug name extractor. Given prescription text, return ONLY a JSON array of lowercase generic drug names. Convert brand names to generics. If unsure, include the original token.

Example input: "Pt on Coumadin 5mg daily, started Advil 400mg PRN"
Example output: ["warfarin", "ibuprofen"]

Return ONLY the JSON array. No prose.

Input: {text}
Output:"""

Test it:

import requests, json

def extract_drugs(text):
    r = requests.post("http://localhost:11434/api/generate", json={
        "model": "mistral:7b-instruct-q4_K_M",
        "prompt": EXTRACTION_PROMPT.format(text=text),
        "stream": False,
        "options": {"temperature": 0.1}
    })
    raw = r.json()["response"].strip()
    return json.loads(raw)

print(extract_drugs("Pt taking Lipitor 20mg, prescribed Biaxin for bronchitis"))
# Expected: ['atorvastatin', 'clarithromycin']

Low temperature is critical. You want deterministic extraction, not creativity.

Step 4: The interaction lookup

Pure SQL. No LLM involved. This is where the actual safety logic lives.

def find_interactions(drugs, conn):
    pairs = []
    for i, a in enumerate(drugs):
        for b in drugs[i+1:]:
            lo, hi = sorted([a, b])
            cur = conn.execute(
                "SELECT severity, mechanism, citation_id, dataset_version "
                "FROM interactions WHERE LEAST(drug_a, drug_b) = %s "
                "AND GREATEST(drug_a, drug_b) = %s",
                (lo, hi)
            )
            for row in cur.fetchall():
                pairs.append({"drug_a": lo, "drug_b": hi, **row})
    return pairs

If the lookup returns nothing, the system says nothing. The model is never given the chance to invent an interaction the database doesn't know about.

Step 5: Grounded explanation generation

Now the LLM's second job: turn a retrieved row into a clinician-readable alert. Note the prompt structure — the model gets the data and is told to paraphrase, not augment.

EXPLAIN_PROMPT = """You are writing a clinical alert. Use ONLY the facts in the INTERACTION_DATA block. Do not add information not present. Keep to 2 sentences.

INTERACTION_DATA:
Drug A: {drug_a}
Drug B: {drug_b}
Severity: {severity}
Mechanism: {mechanism}
Citation: {citation_id}

Write the alert:"""

def explain(interaction):
    r = requests.post("http://localhost:11434/api/generate", json={
        "model": "mistral:7b-instruct-q4_K_M",
        "prompt": EXPLAIN_PROMPT.format(**interaction),
        "stream": False,
        "options": {"temperature": 0.2}
    })
    return r.json()["response"].strip()

Expected output for the warfarin + ibuprofen pair:

MAJOR interaction: Concurrent use of warfarin and ibuprofen significantly increases bleeding risk through platelet inhibition. Review and consider an alternative analgesic. (Ref: DDI-0001)

Step 6: Wire it into a FastAPI endpoint

from fastapi import FastAPI
from pydantic import BaseModel
import psycopg2.extras

app = FastAPI()

class RxRequest(BaseModel):
    prescription_text: str
    patient_id: str  # for audit log only, never sent to LLM

@app.post("/check")
def check(req: RxRequest):
    drugs = extract_drugs(req.prescription_text)
    conn = get_db()  # psycopg2 with RealDictCursor
    interactions = find_interactions(drugs, conn)
    alerts = [{"...": i, "message": explain(i)} for i in interactions]
    audit_log(req.patient_id, drugs, interactions)  # write to separate audit DB
    return {"drugs_detected": drugs, "alerts": alerts}

Two things to notice. First, patient_id never enters either prompt — the LLM has no reason to see it. Second, every call writes an audit row with the dataset version, the extracted drugs, and the rule IDs that fired. That audit log is your compliance lifeline.

Step 7: Containerize and lock down the network

Run Ollama and the API in a Docker network with no egress. This is the step that converts "runs on our server" into "PHI cannot leak."

# docker-compose.yml
services:
  ollama:
    image: ollama/ollama
    volumes: ["./models:/root/.ollama"]
    networks: [internal]
  api:
    build: .
    depends_on: [ollama, db]
    networks: [internal]
  db:
    image: postgres:16
    networks: [internal]
networks:
  internal:
    internal: true  # no external access

Front it with an nginx reverse proxy on a separate network for inbound traffic only. Confirm with docker exec api curl https://api.openai.com — it should fail. That failure is the feature.

Common errors and fixes

JSON parse fails on extraction

Mistral occasionally wraps output in markdown fences or adds prose despite the prompt. Add a sanitizer: strip everything before the first [ and after the last ], then json.loads. If it still fails after retry, log the prescription for human review and return an empty drug list — never silently approve.

Brand names not converting

The 7B quantized model misses regional brand names (especially India, Middle East). Pre-process with an RxNorm-derived brand-to-generic map before the LLM sees it. The LLM is a fallback, not the primary normalizer.

Ollama OOM under concurrency

Ollama serializes requests by default. For real traffic, switch to vLLM or TGI with proper batching. Ollama is for getting started; it's not your production inference server at scale.

Slow CPU inference

If you're getting 10+ seconds per call on CPU, that's expected for a 7B model. Either get a GPU, switch to a 3B model (Phi-3-mini works for extraction), or add Redis caching keyed on the normalized prescription text. Most prescriptions in an e-pharmacy workflow repeat heavily.

Model flags interactions not in the database

If you see this, your explain prompt is leaking parametric knowledge. Tighten it: enumerate the exact fields and add "If a fact is not in INTERACTION_DATA, do not include it." Run a regression test set of 50 known prompts after every prompt change.

What this approach is bad at

Honest tradeoffs:

How CodeNicely can help

We built the prescription and drug interaction layer for HealthPotli, an e-pharmacy serving Indian markets where data residency and pharmacist workflow integration both mattered. The work that's directly relevant to your situation: setting up the local NLP pipeline, mapping brand-to-generic across regional formularies, and building the audit log structure that pharmacist supervisors actually use during dispensing review.

If you're a CTO trying to decide between "ship fast on OpenAI and deal with compliance later" vs "build it on-prem from day one," we've done the second path and can tell you where it gets expensive in engineering hours and where it doesn't. Our AI studio page has more on the LLM-specific work, and our startups practice covers how we scope this kind of build for early-stage teams.

Frequently Asked Questions

Is running Mistral locally actually HIPAA compliant?

The model itself is neither compliant nor non-compliant — compliance is a property of your overall system. Running inference on infrastructure under your BAA, with PHI never leaving your environment, removes the third-party data processor problem that comes with API-based LLMs. You still need access controls, audit logging, encryption at rest and in transit, and signed BAAs with any infrastructure vendors (AWS, your colo, etc.).

Why not just use a hosted compliant LLM service like Azure OpenAI with a BAA?

That's a valid path if your data residency requirements allow it. The local approach matters when you're operating in jurisdictions where Azure's compliant regions don't help (some Middle East and India deployments), when contractual obligations with hospital partners forbid any cloud LLM, or when your security review board simply will not approve PHI leaving your VPC. Pick based on actual constraints, not aspirational ones.

Can a 7B model really be accurate enough for medical use?

For the two narrow tasks in this design — drug name extraction and paraphrasing a retrieved record — yes, with a regression test set and the brand-name preprocessor. For open-ended clinical reasoning, no, and you shouldn't use any LLM for that without clinician oversight. The architecture in this post is specifically designed so the model is never the source of safety-critical knowledge.

How do we keep the drug interaction dataset current?

License a maintained source (DrugBank, First Databank, Lexicomp) and build a versioned ingestion pipeline that bumps dataset_version on every update. Run your regression test suite against the new version before promoting. Never edit interaction rows in place — append a new version and migrate.

What does it cost and how long to build production-ready?

That depends heavily on dataset licensing, your existing infrastructure, hardware choices, and how much of the pharmacist workflow you need to integrate. Talk to CodeNicely for a personalized assessment based on your specific stack and compliance footprint.

Building something in Healthcare?

CodeNicely partners with founders and tech teams to ship AI-native products that move metrics. Tell us about the problem you're solving.

Talk to our team