ai security engineer at arxi

arxi is a personal ai agent that lives inside your messenger. one agent, one user, one isolated micro-vm that reads, decides and acts on your behalf. we're a tiny team that ships to production multiple times a day. our product holds people's real data, calendars, credentials and browsers, so security is not a feature, it's the substrate.

you'll own the security of a system that runs autonomous agents on behalf of real people, with access to their data, their accounts and a live browser. the threat surface is unusual: it's not just our infrastructure, it's the agent itself. prompt injection, tool-call abuse, data exfiltration between tenants, and a host root that we'd like to be unable to read user data even if it wanted to. this is a build-and-defend role, not a policy role. you'll write code, model threats, run red-team exercises against the live product, and harden the runtime, at our shipping pace.

how we work

accuracy, then speed, then cost, in that order. accuracy never trades down. if the agent is wrong about the real world, it breaks trust, and trust is the whole product.
ship very fast. no staging theater, no quarterly roadmaps. hear it in the morning, ship it in the afternoon, roll back in one command if it's wrong.
automate everything that runs twice. builds, deploys, tests and reviews run as scripts, not as rituals.
no mvps, no half-measures. if a problem can be solved properly in code, even if it's hard, we do it properly. cheap code means correct and thorough, not quick and trimmed.
tiny team, enormous leverage. you own surfaces end-to-end and your decisions hit real users the same day.

our stack

typescript (grammy, fastify), python (fastapi), next.js with trpc and prisma, gemini via vertex ai behind our own llm proxy, firecracker micro-vms (one per user), self-hosted linux (hetzner, nginx, systemd), sqlite, prometheus and grafana, polar for payments. heavily automated tooling and a fast deploy pipeline, so engineers spend their time on the hard problems rather than the plumbing.

in this role you will

threat-model the full agent loop (prompt injection, jailbreaks, tool-call abuse, cross-tenant exfiltration, untrusted web content flowing into a privileged agent) and turn the findings into shipped controls, not slide decks.
harden the micro-vm runtime: isolation between tenants, sandbox escape surface, the host and vm trust boundary, and secrets custody (oauth tokens, api keys, keyring).
build production security tooling: adversarial test harnesses against the live agent, injection and exfiltration detectors, abuse and anomaly monitoring on per-turn telemetry, and automated regression checks that run in the deploy pipeline.
push the privacy ceiling: design toward "host root cannot read user data" with at-rest encryption, crypto-shred and off-host anchoring, and be honest about what code alone can and cannot guarantee without a tee.
own incident response for the agent and the infra: detect, contain, root-cause, and ship the fix the same day, then encode the control so the class of bug closes for good.
secure the llm data path: the key and usage proxy, provider egress, grounding and tool calls, and the cost and abuse circuit breakers that keep a single user, or a single jailbreak, from burning the bill.
embed security into how we ship: review risky changes (agent prompts, auth, billing, infra), define the guardrails, and make the secure path the fast path so nobody routes around it.

you might be a great fit if you have

strong hands-on security engineering experience. you ship code, not just findings, across application, infrastructure and host-level security.