Geek Partners Labs
··
Back to Insights
February 14, 2026·7 min read·Geek Partners Labs

Anonymizing PII before it reaches an LLM: lessons from a Kazakhstan financial holding

A reversible proxy that anonymizes incoming messages, calls the upstream model, and restores values for the user. What worked, what we underestimated, and when this kind of project is infrastructure rather than 'AI practice'.

LLM ProxyPIIKZ_IINFastAPIPresidio

When the holding decided to give analysts and support a usable interface to LLMs, we hit one wall fast: nothing containing real ИИН, card numbers, or customer accounts could go to a third-party API. The ban wasn't 'cautious' — it was absolute. InfoSec and compliance wanted to see exactly how we'd guarantee it.

The answer turned out to be boring, which is why it works: a reversible proxy that anonymizes incoming messages, calls the upstream model, and restores the original values on the way back. Stack — Python 3.10, FastAPI, Presidio analyzer/anonymizer, one container next to the app. Nothing magical — nothing a five-year-old banking gateway wouldn't do. Just on a new channel.

Four things that public tutorials usually leave out.

Off-the-shelf libraries don't know what an ИИН is

Out of the box, Presidio finds emails and credit cards. That's it. No ИИН, no БИН, no local phone formats. We wrote our own recognizers: kz_iin, kz_bin, kz_phone, kz_id_card, ru_inn, ru_snils, ru_passport, bank_card, iban. Each one — pattern plus context words ('иин', 'жеке сәйкестендіру нөмірі', and so on).

First mistake, immediately: a 12-digit pattern matches order numbers, SKUs, payment references — anything 12 digits long. Fix — control-digit validation against the official algorithm (six digits of date, century/gender digit, weights 1–11, second pass with different weights on collision). Without that check the system was a hyperactive anonymizer that nuked the business meaning of every message.

Per-request vault, not a global one

Anonymization is useless if you can't reverse it — the model's answer needs to come back to the user with real ИИНs and real names, otherwise the whole thing degenerates into an offline nonsense generator. Mappings (original ↔ placeholder) live in a thread-safe in-memory dict with a 3600-second TTL. The session is bound to the request id; nothing long-lived stays around.

Important detail: the first version used a global placeholder counter, and under concurrent load user A occasionally got back a response containing user B's ИИН. Obvious lesson — but we walked into it because synchronous tests didn't catch the race.

OpenAI compatibility was the architectural decision that mattered

We exposed /v1/chat/completions with the same contract as the upstream provider. Internal teams that already had code on openai-sdk changed their base_url — that was it. No new client libraries, no training, no 'let's discuss the integration architecture'.

Sounds trivial, saved weeks. If we'd invented our own REST contract, we'd be sitting on a clean, unused service.

What we underestimated

Anonymization loses context. 'Ivan Petrov is overdue on a payment' and '[PERSON_1] is overdue on a payment' are not equivalent inputs to an LLM. Answer quality on some tasks dropped by 5–10%. For support, tolerable; for analytics, where the model leans on the names of our own brands and products, we needed a whitelist — a list of things that should not be anonymized (our companies, products, cities), otherwise the assistant stops understanding what company it's even working for.

Performance. Presidio adds 50–200 ms per request. Invisible in chat; painful in bulk log processing. We had to build a separate path with its own constraints.

And the big one — a proxy doesn't replace data classification at the source. If the CRM stores passport scans in 'comment' fields, no Presidio fixes that. The proxy closes one leak channel. Not more.

When it's actually worth it

Worth doing if the LLM receives real customer data, if regulation requires demonstrable absence of PII in outbound traffic, or if multiple internal teams find it easier to share a gateway than to negotiate the policy with each one separately. Don't oversell it if the team works on synthetic data, all traffic stays on an on-prem model inside the same VPC, or volume is a dozen requests a day and it's easier to teach people 'don't paste ИИН by hand'.

An LLM proxy on its own is not an 'AI practice' — it is the piece of infrastructure that lets an AI practice exist without ongoing pain with InfoSec. And that's probably the most honest framing for projects like this: you're not buying answer quality, you're buying the right to receive answers at all.