arxi is a personal ai agent that lives inside your messenger. one agent, one user, one
isolated micro-vm that reads, decides and acts on your behalf. we're a tiny team that ships to
production multiple times a day. our product holds people's real data, calendars, credentials
and browsers, so security is not a feature, it's the substrate.
you'll own the security of a system that runs autonomous agents on behalf of real people, with
access to their data, their accounts and a live browser. the threat surface is unusual: it's
not just our infrastructure, it's the agent itself. prompt injection, tool-call abuse, data
exfiltration between tenants, and a host root that we'd like to be unable to read user data
even if it wanted to. this is a build-and-defend role, not a policy role. you'll write code,
model threats, run red-team exercises against the live product, and harden the runtime, at our
shipping pace.
how we work
- accuracy, then speed, then cost, in that order. accuracy never trades down. if the agent is
wrong about the real world, it breaks trust, and trust is the whole product.
- ship very fast. no staging theater, no quarterly roadmaps. hear it in the morning, ship it in
the afternoon, roll back in one command if it's wrong.
- automate everything that runs twice. builds, deploys, tests and reviews run as scripts, not
as rituals.
- no mvps, no half-measures. if a problem can be solved properly in code, even if it's hard, we
do it properly. cheap code means correct and thorough, not quick and trimmed.
- tiny team, enormous leverage. you own surfaces end-to-end and your decisions hit real users
the same day.
our stack
typescript (grammy, fastify), python (fastapi), next.js with trpc and prisma, gemini via vertex
ai behind our own llm proxy, firecracker micro-vms (one per user), self-hosted linux (hetzner,
nginx, systemd), sqlite, prometheus and grafana, polar for payments. heavily automated tooling
and a fast deploy pipeline, so engineers spend their time on the hard problems rather than the
plumbing.
in this role you will
- threat-model the full agent loop (prompt injection, jailbreaks, tool-call abuse,
cross-tenant exfiltration, untrusted web content flowing into a privileged agent) and turn the
findings into shipped controls, not slide decks.
- harden the micro-vm runtime: isolation between tenants, sandbox escape surface, the host and
vm trust boundary, and secrets custody (oauth tokens, api keys, keyring).
- build production security tooling: adversarial test harnesses against the live agent,
injection and exfiltration detectors, abuse and anomaly monitoring on per-turn telemetry, and
automated regression checks that run in the deploy pipeline.
- push the privacy ceiling: design toward "host root cannot read user data" with at-rest
encryption, crypto-shred and off-host anchoring, and be honest about what code alone can and
cannot guarantee without a tee.
- own incident response for the agent and the infra: detect, contain, root-cause, and ship the
fix the same day, then encode the control so the class of bug closes for good.
- secure the llm data path: the key and usage proxy, provider egress, grounding and tool calls,
and the cost and abuse circuit breakers that keep a single user, or a single jailbreak, from
burning the bill.
- embed security into how we ship: review risky changes (agent prompts, auth, billing, infra),
define the guardrails, and make the secure path the fast path so nobody routes around it.
you might be a great fit if you have
- strong hands-on security engineering experience. you ship code, not just findings, across
application, infrastructure and host-level security.