Ramp: Lessons from Building a New AI Product - The Pragmatic Summit

The Pragmatic Engineer

With Nik Koblov (EVP Engineering), Veeral Patel (Director, Applied AI and Spend), Will Koh (Staff Engineer, Applied AI), and Ian Tracey (Staff Software Engineer). Recorded at The Pragmatic Summit www.pragmaticsummit.com Watch the session with Q&A also included: https://newsletter.pragmaticengineer.com/p/the-pragmatic-summit-recordings 0:00 Intro 3:31 Pivot: from many agents to one agent with many skills 7:17 Policy Agent 12:21 Tooling lessons 22:46 Infrastructure + culture 26:13 Tool catalogs, sandboxing, and reliability at scale

Hosts: Ian, Will, Nick

📺Watch on YouTube

📅March 09, 2026

⏱️00:36:45

🌐English

🤍0 likes

Disclaimer: The transcript on this page is for the YouTube video titled "Ramp: Lessons from Building a New AI Product - The Pragmatic Summit" from "The Pragmatic Engineer". All rights to the original content belong to their respective owners. This transcript is provided for educational, research, and informational purposes only. This website is not affiliated with or endorsed by the original content creators or platforms.

Watch the original video here: https://www.youtube.com/watch?v=NMs8C2_3M0w

00:00:05Nick

Today we're going to talk about AI at RAMP, and I'm going to give a quick introduction into what RAMP is. Really briefly, we're going to walk through the simplest possible expense use case that you guys can all resonate with because I see everybody's drinking coffee. And then we're going to talk quickly about a lesson that we learned this year while we were building a gazillion agents, and sort of the pivot in the paradigm that's happening, especially after February 6.

🤍0 likes💬 0 comments

00:00:36Nick

And then we're going to double-click onto how we built one of our most popular agents, the Policy Agent. And then finally, we'll dig into the infrastructure build that this is requiring to do on our side, and in my mind, most importantly, the culture shift that needs to happen on everyone's teams in order to be able to operate in a way that delivers products into the hands of your customers in the fastest and most impactful way.

🤍0 likes💬 0 comments

00:01:06Nick

So without further ado, a quick intro about RAMP. We are the number one finance platform for modern businesses. We have 50,000 plus customers, and we're in the business of saving you time and money. I've seen some of those names on the name tags here, so thank you for being customers. Really exciting.

🤍0 likes💬 0 comments

00:01:32Nick

Really quickly. So a cup of coffee usually takes about 15 minutes of your time because you got to do these three simple things, which unfortunately take minutes. This compounds through the company, and what RAMP does in the simplest possible way, we just condense time and return money back.

🤍0 likes💬 0 comments

00:01:54Nick

So a simple story of a transaction from tapping the card, to writing a memo, to classifying the transaction according to your GL, to sourcing the receipt, attaching the receipt, normalizing the merchant to your inventory of merchants, is all done agentically at RAMP. And this was our first foray, probably by now—you guys still hear me? Yeah. Probably by now, about three years ago, we started doing these one-shot things with AI: normalize merchant, write a memo. And it's been working really, really well as the models get better.

🤍1 like💬 0 comments

00:02:29Nick

What else is going on at the company? Well, literally every persona at the company is wasting time on a lot of manual work. So from AP clerks to your finance team, from your purchasing teams... keep going to more finance work, your data teams. At RAMP, we used to have a channel called "help data" where somebody would ask for a CSV, and a poor person would go and write a SQL query. We replaced it about a year and a half ago.

🤍0 likes💬 0 comments

00:03:02Nick

So a lot of time is being spent, and the complexity has a ramp shape. It only increases as you go through different jobs to be done. So if you guys watched the Super Bowl, you might be familiar with Brian, our agent. So we've been writing a lot of agents, literally for every job to be done, to cover the entirety—in the end state—the entirety of what admins, employees, and finance teams are doing that is not directly related to making the money. We want you all to be making money and focus on your customers, not on how to close the books.

🤍1 like💬 0 comments

00:03:36Nick

But what's been happening for the past few weeks is that we're living through the most exciting paradigm shift in software, and it requires a complete rethink, and with rethink, simplification of your stack. So what we learned is you don't need to build a thousand agents. We intentionally, last year, allowed each individual team to go and experiment, and we ended up maybe with four different ways of doing the same thing, both for synchronous agents as well as for background agents. But instead, you want to drive your framework towards a single agent with a thousand skills.

🤍1 like💬 0 comments

00:04:20Nick

So let's talk about what software traditionally used to focus on. Every process, especially in the modern AI stack, boils down to having an event—so you can receive an invoice and you want to pay it; some prompt instructions of what you want to do with it; and some guardrails, like an expense policy or a payables policy; context, what is the data that the agent should consider; and then finally, tools. These are APIs and actions that you can do. And traditionally, software would focus on only four and five.

🤍0 likes💬 0 comments

00:04:54Nick

In the new paradigm, software is doing everything. So you want to focus on building an autonomous system of action that can react, reason, and act without a human or with very little human supervision.

🤍0 likes💬 0 comments

00:05:11Nick

So what does it mean in terms of what we're building? So first, we decided we're going to consolidate the verbal interactions with the agents to a single conversational UX. We literally, at the end of last year, had about five different conversational UXs. We now have consolidated it into what we call an OmniHat. Omni meaning omnipresent. It is now being deployed to every surface of the product. And it works well with the traditional UX, because you still need tables and buttons, and you don't always want to be talking to your software.

🤍1 like💬 0 comments

00:05:50Nick

But this is a good example of what OmniHat looks like: "Please onboard a new employee." OmniHat can resolve an employee to an employee ID and look up through an HRIS tool their corporate structure. And it found a workflow, an agentic workflow that we created previously, called the new hire playbook. And the agent is asking, "Would you like me to onboard the person using this playbook?"

🤍1 like💬 0 comments

00:06:14Nick

How is this possible? We built an in-house lightweight agent framework that provides orchestration with tools that engineers are very quickly building. And most recently, we have one product manager vibe code about 20 tools, so engineers are no longer needed to build those tools. And sometimes your workflows are involved, such as employee onboarding consists of four steps.

🤍0 likes💬 0 comments

00:06:38Nick

So you can just go on RAMP and describe what you want to happen when a new employee joins: give them a card, make sure they get receipts for every transaction, congratulate them on Slack, and check in with them in two weeks. We now are able to compile this into a runnable, deterministic workflow and then give it to the agent to execute. Playbooks make use of tools, and how this all comes together.

🤍1 like💬 0 comments

00:07:05Nick

This is an example which Viral is going to double-click next. Upon swiping the card, there's a real-time policy review that's happening directly in the software, and Policy Agent enforces your company requirements with regard to spend. Therefore, it's very safe to give RAMP cards to literally every employee in your company.

🤍0 likes💬 0 comments

00:07:28Nick

And there's a handoff happening with an accounting coding agent that classifies this transaction, applies the rules of your back-office team, of your finance team. As an employee, I have no idea how a certain transaction should match to our GL, and that's what typical traditional products would do—they will expose it to you. So the agent is much better at doing it because it has the full context of your chart of accounts. It understands your ERP, and then it can either auto-approve or, in the worst-case scenario, it will involve the human in the loop to review materiality or notify that there is an out-of-policy spend. With that, please welcome Viral, who'll dive deeper into the Policy Agent.

🤍0 likes💬 0 comments

00:08:12Viral

Thanks, Nick.

🤍0 likes💬 0 comments

00:08:19Viral

Awesome. So, a lot of finance teams are looking at receipts like this basically every day, and maybe they might have hundreds or thousands of these. If you told me to look at this and decide if I should approve or reject this transaction, I'm probably going to make a mistake.

🤍1 like💬 0 comments

00:08:34Viral

So, Policy Agent basically reasons on this image and all the transaction data that we have, and told me that there were eight guests in the receipt. I could barely see that when I was looking at it. It was below the $80 a person cap that we have internally. They were going for a team welcome dinner, and so because the amount was verified as well and the merchant, Policy Agent told me to approve this transaction.

🤍1 like💬 0 comments

00:08:58Viral

Similarly for this OpenAI transaction, Anand was testing out some ChatGPT features, and so Policy Agent told me this was a valid business expense and told me to approve it. And then this $3 bakery charge was rejected because it wasn't part of an overtime purchase and it didn't happen on the weekend.

🤍0 likes💬 0 comments

00:09:21Viral

So really, we looked at this as an opportunity to rethink how RAMP was set up. Controllers and finance teams are looking at transactions like these and making these decisions every day. And a Fortune 500 company that is one of our customers was coming to us and saying, "Hey, can you make sure that you approve these types of expenses and reject these types of expenses?" And they basically had a list of all the rules that RAMP should follow.

🤍1 like💬 0 comments

00:09:48Viral

And we kind of saw this as an opportunity not to kind of add more incremental deterministic rules that kind of defined our product—and I worked on some of the first versions of these—but actually kind of take out a page from Andrej Karpathy saying that English is the new programming language, and kind of turn the expense policy into the rules themselves.

🤍0 likes💬 0 comments

00:10:09Viral

So you can see RAMP's expense policy on the left, and this is a screenshot from our production environment, but we are seeing really great use out of our Policy Agent product. And it kind of needed to start really organically. So we kind of operated like an early-stage startup. We're already very incremental and fast at RAMP, but we found some design partners like that Fortune 500 company, we iterated really quickly, and we had weekly meetings with all of them to kind of understand exactly what feedback we wanted to hear and what we could improve.

🤍0 likes💬 0 comments

00:10:45Viral

I think one of the main important things that we realized across RAMP is that we really needed to lean into the fact that AI products cannot be one-shotted. You need to start with something simple. And so as long as everyone on your team—PMs, designers, engineers—are aligned that you're not going to have perfection on day one, I think that was actually one of the main cultural learnings.

🤍0 likes💬 0 comments

00:11:10Viral

And so we dogfooded a lot of this work internally and started with an even more constrained problem of trying to decide whether our coffee with a colleague transaction should be approved or rejected. These are single dollar amount transactions that are low risk according to our finance team. And so we started with these transactions, and one of the early learnings, especially as we kind of released this into production, was that a lot of the reason that Policy Agent would be wrong would be less on the models themselves and more about the context that we were giving to LLMs themselves.

🤍0 likes💬 0 comments

00:11:45Viral

So we could have sat down and thought about all the context in the beginning before we even kicked off any engineering work. But we realized actually the best thing would be to learn from some of our live internal data. And so, for example, we learned that the role and the title of an employee is super important when looking at expense policy docs. Certain levels, C-suite for example, might have higher limits, maybe they can fly on first class for certain flights.

🤍0 likes💬 0 comments

00:12:08Viral

And so we started extracting more information from receipts, started pulling in information from HRIS fields that are already on RAMP. And so Will is going to kind of talk you through exactly the iterations that we went through to implement Policy Agent and some of the learnings along the way.

🤍0 likes💬 0 comments

00:12:34Will

Yeah. All right. Cool.

🤍0 likes💬 0 comments

00:12:38Will

Awesome. So when we first started building the Policy Agent internally, we dreamed, we went big. We're like, "Hey, let's automate all of finance. Let's automate all reviews." But when it came down to it, we actually had to start small. Is that cup of coffee, you know, in your expense policy?

🤍0 likes💬 0 comments

00:12:54Will

And the reason that we did that was because even though the problem sounds simple to automate—you know, is this a simple question, is this in policy or not?—it was going to grow to be complex. Kind of like Viral said, we could have gone down and we could have figured out what context do we have, how can we add it, how can we put it all together in a way that LLM can understand, and you know, put it all together from the get-go. But we knew that even if we aimed and got everything right the first time, it was probably going to be wrong once you applied and generalized it and went to another business.

🤍0 likes💬 0 comments

00:13:26Will

So the simpler the system, I think the easier it is to iterate on top of it. And once you iterate, you know what's going to work, you know what's not, and you can kind of layer complexity on top of that. And I think that's pretty important to keep in mind when you're building an LLM or an agent starter.

🤍1 like💬 0 comments

00:13:42Will

So for us, we started really simple. Very, very, kind of the classic, you know, we have an expense come in, retrieve the context around it, we pass it through a series of LLM calls that are very well defined of like, "Hey, is this in policy? Why is it in policy? How can we show the user that it's in policy?" and then give an output that makes sense in this way to the user.

🤍0 likes💬 0 comments

00:14:00Will

Eventually we learned that each expense is kind of different. We can classify an expense based on: is it travel, is it a meal, is it entertainment? Do conditional prompting and then retrieve context based on that, pass a series of LLM calls, and give it some tools so that it can also autonomously decide, "Hey, I need flight information actually, or I need this employee's level," and kind of layer that on top.

🤍0 likes💬 0 comments

00:14:21Will

And a few iterations later, we came to a full-on agentic workflow. We ended up with complex tools to read across all of our platform, and these tools are shared across all of our agents. It's not just for Policy Agent; we have a company internal toolbox that all of our agents can easily reach into. And we gave it the capability to write as well. So it's now writing decisions, it's writing reasoning, it's auto-approving expenses on users' behalf. And it goes in a loop.

🤍0 likes💬 0 comments

00:14:52Will

So now it's more of a black box. And that's kind of the trade-off you get. As you go from simple to complex systems, your capability goes up, your autonomy goes up, your agents are able to do more, your AI can do more, your AI seems smarter. But in exchange, you're losing traceability and explainability. We look at it now, we can kind of look at the reasoning tokens that the LLM gives us, but in the end, we have no control over it. It's going to do what it thinks is right. It's going to make the tool calls. It's going to tell you it's right or wrong. So a smaller black box becomes a bigger black box as the system becomes more complex.

🤍0 likes💬 0 comments

00:15:28Will

So one thing that is really important when doing something like this is from the beginning, you need really good auditability. Assume even if you know how it works, assume that your inputs and outputs are all you know, and make sure that it's correct. So if it was a black box system and you only saw the input-output, can you verify that it did the right thing? And even if that black box changes, you should be able to reason about whether the output is correct.

🤍0 likes💬 0 comments

00:15:54Will

As with many products that we built at RAMP and across other companies, we thought that the users would be correct. You know, if the user says approve, the agent should approve. If the user says reject, the agent should reject. But turns out the users are actually incorrect. They're wrong. They are sometimes, you know, they don't know the expense policy. You know, they trust their employees. They're lazy. It's a Sunday. Who knows? So turns out we can't always do what the users are doing because sometimes that's where finance teams come back to you and are like, "Hey, this is wrong. This shouldn't be on the company card."

🤍1 like💬 0 comments

00:16:25Will

So, we had to define our own definition of correctness. And to do that, we had a weekly labeling session with cross-functions that are working on this product. And that had two kind of really good outcomes. One was that we had a ground truth dataset that we could always test against, and we knew that this was correct. And two was that everyone was on the same page. If our agent got something wrong, everyone knew that it got it wrong. Or if our agent is missing context, everyone knew that it's missing that context. So there was less communication, everyone's on the same page, and they could focus on what's really priority and kind of have alignment on that.

🤍1 like💬 0 comments

00:17:02Will

Initially, getting all those people together in a room every week, giving them homework to label a 100 data points, it's expensive. Everyone has things to do, and sometimes they don't come back with their homework done. It almost becomes tedious, even though it's so important. So we wanted to make it as simple as possible. And the way we did that was that we looked for third-party vendors that could provide us the tools to label data and collect the data, but turns out some tools are too specific to a use case, some tools are too general. And we could have spent weeks trying out different tools, but we decided let's just build our own.

🤍0 likes💬 0 comments

00:17:34Will

So we used Claude Code using Streamlit. We basically one-shotted all of this. And the greatest part of it all is that it's low maintenance, low risk. It's in a part of the codebase that if it breaks, we can fix it right away. Deploys happen in like instant seconds. And non-engineers can go and personalize it. They can vibe code it. They can Claude Code. And this was in Opus 4. So now with Opus 4.6, I expect it's even better. And with something like that, it's definitely easier and cheaper sometimes to do something one-off like this.

🤍1 like💬 0 comments

00:18:05Will

And with that, with the ground truth dataset, we were able to make quick iterations. We were able to find out, "Hey, we need employee levels, add that. How does that work? Run it against this dataset. Does it actually catch it? And now say accept or approve." And we were able to make really quick iterations, and that was actually kind of a key point in developing this. We had really early confidence that this could actually work, and we were able to actually get a lot of buy-in, get a lot of customers onboarded, and kind of try it out as a design partner.

🤍0 likes💬 0 comments

00:18:35Will

And as part of doing that iteration with the dataset, you had evals. And I feel like evals are being... obviously everyone I think in this room now knows about evals and what they mean, but it's pretty important to have them early on. I wouldn't say that, you know, don't let perfectionism get in the way. You don't need a full dataset of a thousand data points so you're testing against every iteration. We started with five, you know, and we knew that those five we were not going to fail.

🤍0 likes💬 0 comments

00:18:59Will

We kept adding and adding and adding, and make sure it's easy to run. Anyone could go and just run that command, and then make sure that the results are really easy to understand. They are able to look at it, get instant output, and understand like, "Hey, this is what the model's doing. This is good, this is bad." And if you run it as part of your CI, everyone now can safely merge in code.

🤍0 likes💬 0 comments

00:19:21Will

Because whenever you think you're doing something right for the LLMs or agent, giving more context, giving tools, more likely than not, it's probably gonna have some kind of bad consequence that you didn't see happening: context rot. Whether it be the tool instructions are wrong, or maybe the docstring was a little confusing and conflicting. So it might have consequences. You just want to make sure you're catching against those.

🤍0 likes💬 0 comments

00:19:42Will

And then I'll touch on it briefly, but online evals are also great. So these are offline. You have a dataset. It's historical. You're testing it. But if you can... online evals can be a little more confusing and harder to kind of measure, but if you can measure anything as your users are interacting with the system, definitely as a leading metric, I'll set them up. And for us, part of that was, "Hey, how many are rates of decisions?" We had an "unsure" decision, which just meant that the agent didn't have enough information, so we could measure that online. No, it's a much simple eval, but that also gave us a pretty good health check as our system was running.

🤍0 likes💬 0 comments

00:20:17Will

Another great part about evals is with evals, you can make confident model changes. Whenever a new model comes out—Opus 4.6, GPT-5.3—you want to make sure that you can leverage those new models because sometimes that could mean the difference between your system getting one part of the problem right to wrong. But it could also mean the opposite. It could actually be not good without any problem changes or changing how your system works. So having evals really set up and being able to benchmark really helps make confident model changes.

🤍1 like💬 0 comments

00:20:47Will

Cool. So now that Policy Agent... we've been developing this for a while, it's available for everyone on the RAMP platform. Some of the things that we learned along the way is that Claude Code, as engineers, is very exciting. We have full control. We get to modify our claude.md. We get to make sure, you know, tell it to not leave comments. It won't leave comments, hopefully.

🤍0 likes💬 0 comments

00:21:04Will

Turns out it's not just us. Finance people also really like to have, you know, modify their claude.md, which is their expense policy. So if something went wrong with the decision, then we just like tell them, "Hey, go update your policy doc," which to them is a little scary concept to begin with. Like, this is a document, like you don't mess with that. You have to go through a lot of hoops if you want to mess with that. But it turns out if you get them really excited about the feedback loop—"Hey, change that, you'll see it right away"—turns out they'll be really excited to do this.

🤍0 likes💬 0 comments

00:21:30Will

And then trust builds over time. So some of the earlier customers that we had were some of the Fortune 500s. We actually started with the really big enterprise customers that we had because we thought that they would have the most value. They have the most expenses coming in. They have the most time spent on reviewing coffee expenses. So, you know, roll out to them, let them have the trust. We didn't do any autonomous action. We're just like, "Hey, we're going to give you a suggestion." That's kind of how we phrased it: suggestions.

🤍0 likes💬 0 comments

00:21:58Will

And eventually, they came to us and were like, "Okay, you know what? I want to go from suggestions to auto-approvals. Like anything under $20, you guys are mostly right. I don't care about this. Let me just go auto-approve it." So we gave them the autonomy slider, we gave them a way to turn it on, and then they actually could do it themselves.

🤍0 likes💬 0 comments

00:22:15Will

And then last but not least, similar to LLMs, users thrive in in-product feedback loops. So, you know, when you're building an AI product and you have a full way of like, LLMs can test if its code was right and it's able to iterate, users are the same way. We gave them in-product ways to improve the expense policy doc, improve the agent and how it operates, and they're more than excited to kind of take it over themselves and kind of improve it and personalize it for them.

🤍0 likes💬 0 comments

00:22:42Will

So from here I'll pass it on to Ian, who's going to kind of talk about the infrastructure and the culture that we have at RAMP that led us to building the Policy Agent.

🤍0 likes💬 0 comments

00:22:54Ian

Hey everybody.

🤍0 likes💬 0 comments

00:22:55Ian

So you've heard a little bit about how we're kind of getting leverage to all of the different finance teams as we operate on top of their financial infrastructure and really try to get leverage for our customers. But I think a big thing that we also spend a lot of time thinking about is how can we get leverage for RAMP itself, the engineers, our XFN orgs, all the people that we work with every single day. And this section is pretty intentionally named "AI infrastructure and culture" because we think that this is both a really challenging infrastructure problem, but it's also a really challenging culture problem, and changing how you work as well is a big part of the story.

🤍0 likes💬 0 comments

00:23:35Ian

And so to kind of start on the infrastructure side, the core of how most of applied AI happens at RAMP is our applied AI service. And at like a 10,000-foot view, this looks something kind of like an LLM proxy or something like LiteLLM. But there's really three main extensions that we've invested in to make this a lot more powerful for a lot of our use cases.

🤍0 likes💬 0 comments

00:23:56Ian

The first is structured output and consistent APIs and SDKs across different model providers. This can be pretty tricky to do, especially with how quickly the APIs are changing, but it's a problem that we don't want downstream product teams to have to think about. So if you have an idea of "I want to switch from GPT-5.3 to Opus, or I want to try Gemini 3 Pro," you should be able to do that with a config change and really quickly be able to iterate on semantic similarity and trying to do a bunch of different code sandboxing and structured output calls that way.

🤍0 likes💬 0 comments

00:24:27Ian

The other thing that we've spent a ton of time thinking about is batch processing and workflow handling. This is really useful for evals, if you're doing bulk document or data analysis. And that's something that we also don't want teams to have to spend a bunch of time on of how do you want to batch this and handle it with rate limits, and do we want to do this on an offline or online job with something like Anthropic. We just want to handle that for downstream consumers so they can just focus on providing value for downstream customers.

🤍1 like💬 0 comments

00:24:53Ian

And then the last, which is a pretty big deal, is the ability to trace different costs across teams and against products as well. And this allows us to kind of identify the Pareto curve of like what is the best model performance for cost, how are these evolving over time, what teams are actually not building something that's going to be sustainable long term for different product services. And this can be really, really important to just remove all this work from internal teams having to think about this.

🤍0 likes💬 0 comments

00:25:19Ian

And the last thing that's kind of I think funny to think about, we often joke about that our customers are actually using more of a frontier model than they may even know is out yet, is it allows us to stay at the frontier. When a new model comes out, it's a one-line config change that impacts every single SDK downstream. And so rather than teams having to learn the SDK or go into dozens of different call sites, they can just change it in one place for their specific team and they now get the benefit of being on the latest and greatest models that we've kind of vetted and built into the rest of the system.

🤍0 likes💬 0 comments

00:25:52Ian

Our product, as you've kind of heard earlier, works on a lot of very sensitive data and very sensitive workflows. And I think oftentimes, something that I hear from engineers in the space is this concept of hallucination and safety, and how are you actually going to be able to produce a lot of these things to have benefits to downstream finance teams?

🤍0 likes💬 0 comments

00:26:12Ian

And we're pretty big believers that it all comes down to the catalog of tools that teams are building and integrating with on a daily basis. And so what you're seeing here is our internal tool catalog. So an example would be like "get a policy snippet" or "per diem rate" or "recent transactions." And these are built alongside of product teams to really understand a lot of the nuances in the data and the use case.

🤍0 likes💬 0 comments

00:26:34Ian

And what's really cool about this is not only can you see where there's gaps in our offering—that oh, we actually don't have a tool for this specific use case—these can be used both in internal repos and our core product. And so if you have an idea of "I want to do a cool reimbursement agent idea," here are the different ways to integrate the tools, the different APIs and systems that they integrate with. And now you can prototype that on a totally new product in a vibe-coded surface area without having to worry about learning all of these things from scratch or building the tools on your own. We're up to like many hundreds of these tools today, and we, as Nick mentioned earlier, think that this could be like multiple thousands over time.

🤍0 likes💬 0 comments

00:27:13Ian

On the topic of context, another big thing we think about is context for our customers of how do we actually integrate the financial stack and allow them to be a lot more productive. But we noticed a very similar problem internally on our engineering team. And I think something that's not as always obvious is that, you know, even if you're using something like Claude Code or Codex, there's all this fragmentation of actually what you do on a daily basis to get work done in your company that that's not integrated to.

🤍0 likes💬 0 comments

00:27:37Ian

There's logs in Datadog. There's a production database that has a bunch of things going on. There's different alerting systems. There's Incident.io. There's a Slack message you have to pull in. There's a Notion doc. And then there's a lot of knowledge that those actual specific product teams have of how they actually need to get work done as well.

🤍0 likes💬 0 comments

00:27:54Ian

And so at the end of last year, we decided to start out and try to solve this problem of how can we actually integrate all this context and build our own internal background coding agent, which we've called RAMP Inspect. You may have seen this on LinkedIn or X. We actually have open-sourced the blueprint of how we built this, and at the end I can definitely show you guys a link of where to find that.

🤍1 like💬 0 comments

00:28:15Ian

And the progress has been pretty phenomenal of actually integrating this into a background agent that can run autonomously as people are in meetings, as bug fixes come up, and things like that. And currently this month, RAMP Inspect is responsible for over 50% of PRs that we merge to production.

🤍0 likes💬 0 comments

00:28:32Ian

I have some interesting... we're like really big nerds with stats and numbers and things like that. So we have this dashboard to kind of create this interesting, subtle healthy competition, but also inspire people that they can actually use this as well. And so you can see engineering has a huge lead in the amount of sessions, but you also have product, you also have design, there's risk, legal, corporate finance, and even marketing and CX teams using RAMP Inspect. And they're doing things like simple copy changes, they're doing logic fixes, they're trying to respond to incidents or bugs.

🤍0 likes💬 0 comments

00:29:05Ian

And what's been really cool to see as this has evolved over time is how we've actually designed a couple of these things with some core principles to be really powerful. So what you're seeing here is a RAMP Inspect session. I think this is an example of a query that we were trying to fix. This spins up in the background a really fast Modal sandbox. This allows us to resume, spin up, and spin down these containers in an isolated environment which has the same environment that you would have if you're developing at RAMP. There's a series of tasks to keep it on track, and it creates a GitHub branch and integrates with all of the context documents, our Datadog, our read replica so it can actually write queries, and different context documents that product teams have put together.

🤍0 likes💬 0 comments

00:29:48Ian

And what's really subtle about how we've designed this is we've designed it to be multiplayer first. And that means that as you integrate or you try to pair with a designer or somebody on the PM team, you can actually help them level up their own prompting skills. They can give us feedback of, "Hey, click on this link. This actually failed in a way that I wasn't expecting." And so that can be a really great source of cross-functional collaboration. That was a very subtle design choice that we made that ended up being a really big impact for the company. And then these can be kicked off either via the Kanban UI, we have an API, and then also a Slack thread. And we can take the full context of the Slack thread when it is actually kicked off, so you don't have to reprompt it with a bunch of conversation that happened earlier.

🤍0 likes💬 0 comments

00:30:32Ian

What you see here is we also have a full VS Code environment. We run VNC inside of a Modal sandbox as well. So this allows us to have Chrome DevTools and MCP. So it can actually do full-stack work, which is pretty cool. And it has access to the 150 plus thousand tests that we have. So it also knows if things are broken, can respond to the CI inside of GitHub, and actually patch fixes before it actually pings you that the PR is done.

🤍0 likes💬 0 comments

00:30:56Ian

The link for this is builders.ramp.com. I think it's one of the first blog posts that we have, or the most recent blog post that we have, and we open-sourced the whole blueprint of how to build this and put this together as well. I think there's also a GitHub repo called Open Inspect, which is an open-source implementation of this as well.

🤍0 likes💬 0 comments

00:31:17Ian

So it's been pretty interesting to see the impact that RAMP Inspect has had, where over 50% of PRs that we merge on a weekly basis go through the system. And so with all this time not spent on thinking about these really low-level firefighting tasks or really low-level small fixes or tweaks that can be democratized across the company, we're really rethinking how our engineering teams operate and think about their job and how they can actually be really impactful in this new AI-native future.

🤍0 likes💬 0 comments

00:31:47Ian

And so as a thought experiment, let's pretend we have two different teams. I'm sure everyone in this room has worked with their handful of extraordinary teams, maybe teams that are finding their footing. And you'll notice that there's a couple of different qualities that may resonate. So we have Team A on the left here. And let's say that they really care about impact. They handle ambiguous problems. They understand the product, business, and data. They adopt new tools. They can find creative solutions and they obsess over the user experience.

🤍0 likes💬 0 comments

00:32:16Ian

And then Team B may also resonate with some people. You know, they debate libraries. They add process when things start to feel chaotic. They constantly complain about headcount. They bikeshed the details instead of actually focusing on the user experience—like, "Hey, should we use a functional programming paradigm here, or what version of TypeScript libraries do we want to use?" And then they build before understanding the problem, right? They just say, "Hey, we're going to just vibe code this, bro, don't worry." Or they focus on performative code quality or nitpicks that may be very much a subjective matter of fact as well.

🤍0 likes💬 0 comments

00:32:49Ian

I've worked on both of these teams, and I think the argument that I'm going to make today is that there's going to be a divergence, I think, depending on what side of the aisle you land there. This is a study from Harvard that was out at the end of last year, and it was very much geared towards juniors and seniors in terms of what's actually happening with hiring trends in engineering since AI tools have accelerated.

🤍0 likes💬 0 comments

00:33:10Ian

And I think what this glosses over is I don't think it's just a years of experience problem. I actually think it's very much all of the different qualities that I said in Team A versus Team B that really make it apparent that coding was never really the hardest part of a lot of jobs for a long time. There's all these other engineering principles that become really important than just raw coding speed.

🤍0 likes💬 0 comments

00:33:32Ian

So when you think about a Staff or a Staff-Plus engineer, you're really compensating those people more for a lot of the judgment that they bring to the table, the context, the ability to see around corners, all the learning that they have, the actual scar tissue. And so if you ask Opus 4.6 to do something, they'll have the knowledge to actually know if that is not going to work or that's actually a bad idea.

🤍0 likes💬 0 comments

00:33:54Ian

And I think one thing that a lot of the narratives that we see in the media get wrong about coding agents is they don't really identify the fact that you could still build the wrong thing just a lot faster, and you can build bigger messes. And I think that having a lot of these skills of a Team A and really focusing on what is the context and reason behind this will only become more important in AI.

🤍0 likes💬 0 comments

00:34:16Ian

And so what does that actually look like? We hit on some of these things: figuring out what to build and understanding users well enough. Selling an idea to skeptical stakeholders. This is still something... when we decided to build a background coding agent, this was not something that was obvious that we should be spending time on this. Having good design decisions with incomplete information and maintaining momentum through the long middle of this project, which can be really gnarly.

🤍0 likes💬 0 comments

00:34:41Ian

And I think this last bit, you know, everyone in this room, I'm sure, is painfully aware of the conversation around SaaS and the stock market and things like that. And I think this is a big element that they gloss over, which is that yes, it's easy to vibe code something, but actually going through that middle process is why you need really good engineers to actually get something deployed that has product-market fit that people are really excited about. And I think not enough people recognize that.

🤍0 likes💬 0 comments

00:35:09Ian

And so where does that leave us? Personally, I think there's a lot of doomerism and scariness around a lot of the AI narratives, but I think it's also a really exciting time to be building. Unlike maybe factory work or farming, software is never done. We have this really kind of meme internally where we say, you know, "Job's not finished." You've probably seen it in the marketing as well. And I think software is perpetually not finished.

🤍0 likes💬 0 comments

00:35:33Ian

And so with all this extra capacity, with people focusing less on this kind of low-level work and more on high-leverage engineering tasks, I think four things are going to really happen. I think companies are just going to chase opportunities they couldn't afford to pursue. I don't know if we would be chasing these agentic workflows and really thinking about bigger scale problems in the financial stack if this technology didn't exist.

🤍1 like💬 1 comment

00:35:57Ian

People are going to enter adjacent markets. They're going to try to stitch together more value for customers. It's not going to be like, because everyone's 2x more productive, you need two less or half the people. You're going to rebuild systems that are too expensive to touch. I think building an internal background coding agent for a company that does financial operations software felt like probably a pretty crazy idea, but now that makes a ton of sense, and raise the bar for what good enough means.

🤍0 likes💬 0 comments

00:36:22Ian

I think, you know, being able to kind of build more mind-blowing experiences for users, provide a lot more value is going to be the narrative of the next decade. And I'm super excited to be able to build some of these things and see what everyone in this room is going to build, too. So, thank you.

🤍0 likes💬 0 comments

Video Player