arxi is a personal ai agent that lives inside your messenger. one agent, one user, one isolated
micro-vm that reads, decides and acts on your behalf. we're a tiny team that ships to production
multiple times a day. our product holds people's real data, calendars, credentials and
browsers, so security is not a feature, it's the substrate.
we ship to production many times a day with no staging theater. that only works if quality is
engineered, not hoped for. we're looking for a qa engineer who can break things by hand and in
code today, and who wants to build the qa function from zero: the strategy, the harnesses, the
agent-driven test fleets, the bar that keeps a same-day-deploy culture safe. this is a founding
qa role. you won't inherit a test suite and a process, you'll define them. and because our
product is an autonomous ai agent, testing here goes beyond deterministic asserts: you'll
evaluate non-deterministic agent behavior, accuracy, and regressions, and you'll wield ai agents
themselves as testers.
how we work
- accuracy, then speed, then cost, in that order. accuracy never trades down. if the agent is
wrong about the real world, it breaks trust, and trust is the whole product.
- ship very fast. no staging theater, no quarterly roadmaps. hear it in the morning, ship it in
the afternoon, roll back in one command if it's wrong.
- automate everything that runs twice. builds, deploys, tests and reviews run as scripts, not
as rituals.
- no mvps, no half-measures. if a problem can be solved properly in code, even if it's hard, we
do it properly. cheap code means correct and thorough, not quick and trimmed.
- tiny team, enormous leverage. you own surfaces end-to-end and your decisions hit real users
the same day.
our stack
typescript (grammy, fastify), python (fastapi), next.js with trpc and prisma, gemini via vertex
ai behind our own llm proxy, firecracker micro-vms (one per user), self-hosted linux (hetzner,
nginx, systemd), sqlite, prometheus and grafana, polar for payments. heavily automated tooling
and a fast deploy pipeline, so engineers spend their time on the hard problems rather than the
plumbing.
in this role you will
- test by hand and in code: explore real user flows manually, then turn what you find into
automated coverage. unit, integration and end-to-end across the bot, the services, the admin
console and the telegram mini-apps.
- build and run agent-driven testing: use ai agents to generate cases, drive the product,
reproduce bugs, and expand coverage faster than a human fleet could, and judge their output
critically rather than trusting it.
- own agent-quality evaluation: design eval harnesses for a non-deterministic ai product.
accuracy, behavior regressions, latency, and failure modes like refusals, loops or garbled
output, so we can prove a change is better, not just hope it is.
- stand up the qa function from scratch: test strategy, what's automated versus exploratory, the
release-quality bar, flake control, and how qa plugs into a deploy-from-main pipeline without
slowing it down.
- make quality fast, not bureaucratic: gates that catch real regressions in the deploy path and
stay out of the way otherwise, because our first principle is accuracy but our culture is
shipping fast.
- turn every escaped bug into a permanent test: reproduce it, cover it, and close the class so
it can't come back.
- partner with engineering and the founder on what good means for an agent that acts on real
people's data, and hold that line.
you might be a great fit if you have
- real hands-on qa range: you test manually and you write test code. you're not manual-only, and
not "i just configure a tool".