qa engineer at arxi

arxi is a personal ai agent that lives inside your messenger. one agent, one user, one isolated micro-vm that reads, decides and acts on your behalf. we're a tiny team that ships to production multiple times a day. our product holds people's real data, calendars, credentials and browsers, so security is not a feature, it's the substrate.

we ship to production many times a day with no staging theater. that only works if quality is engineered, not hoped for. we're looking for a qa engineer who can break things by hand and in code today, and who wants to build the qa function from zero: the strategy, the harnesses, the agent-driven test fleets, the bar that keeps a same-day-deploy culture safe. this is a founding qa role. you won't inherit a test suite and a process, you'll define them. and because our product is an autonomous ai agent, testing here goes beyond deterministic asserts: you'll evaluate non-deterministic agent behavior, accuracy, and regressions, and you'll wield ai agents themselves as testers.

how we work

accuracy, then speed, then cost, in that order. accuracy never trades down. if the agent is wrong about the real world, it breaks trust, and trust is the whole product.
ship very fast. no staging theater, no quarterly roadmaps. hear it in the morning, ship it in the afternoon, roll back in one command if it's wrong.
automate everything that runs twice. builds, deploys, tests and reviews run as scripts, not as rituals.
no mvps, no half-measures. if a problem can be solved properly in code, even if it's hard, we do it properly. cheap code means correct and thorough, not quick and trimmed.
tiny team, enormous leverage. you own surfaces end-to-end and your decisions hit real users the same day.

our stack

typescript (grammy, fastify), python (fastapi), next.js with trpc and prisma, gemini via vertex ai behind our own llm proxy, firecracker micro-vms (one per user), self-hosted linux (hetzner, nginx, systemd), sqlite, prometheus and grafana, polar for payments. heavily automated tooling and a fast deploy pipeline, so engineers spend their time on the hard problems rather than the plumbing.

in this role you will

test by hand and in code: explore real user flows manually, then turn what you find into automated coverage. unit, integration and end-to-end across the bot, the services, the admin console and the telegram mini-apps.
build and run agent-driven testing: use ai agents to generate cases, drive the product, reproduce bugs, and expand coverage faster than a human fleet could, and judge their output critically rather than trusting it.
own agent-quality evaluation: design eval harnesses for a non-deterministic ai product. accuracy, behavior regressions, latency, and failure modes like refusals, loops or garbled output, so we can prove a change is better, not just hope it is.
stand up the qa function from scratch: test strategy, what's automated versus exploratory, the release-quality bar, flake control, and how qa plugs into a deploy-from-main pipeline without slowing it down.
make quality fast, not bureaucratic: gates that catch real regressions in the deploy path and stay out of the way otherwise, because our first principle is accuracy but our culture is shipping fast.
turn every escaped bug into a permanent test: reproduce it, cover it, and close the class so it can't come back.
partner with engineering and the founder on what good means for an agent that acts on real people's data, and hold that line.

you might be a great fit if you have

real hands-on qa range: you test manually and you write test code. you're not manual-only, and not "i just configure a tool".