May 14, 2026

DevGuild and Other Stuff I've Been Thinking About

agentic-engineeringairalph-loopeventsefficiency

DevGuild and Other Stuff I've Been Thinking About

I went to HeavyBit's DevGuild event recently, the "write-only-code-summit", meaning agents write code that's not reviewed by a human — humans increasingly being removed from the code gen process. It was the first event I've been to in a while. I'm kind of antisocial, and tend to avoid these things, but I was excited to meet Geoffrey Huntley, who I've written about. He invented the Ralph Loop, which is a super useful pattern for building applications autonomously. It turns out the event was really cool, and I'm glad I went. There were a lot of really smart people building all sorts of things with AI, for AI. Everyone's building. Everyone's got an idea. I felt energized by it, validated by it, inspired. It made me think I should probably be attending more of these, and connecting with like minds, at least. I came away with a lot of ideas, and thoughts, and insights —

I created a YouTube video on the Ralph Loop recently. It's long, it's messy; I'm still learning to do YouTube (finding out it's a lot of work). But I follow the process of building an app from generating the idea using AI, to spec'ing it and building it, using the Ralph Loop. I'm finding I'm running into a lot of issues with token rate limiting, when running a long loop of code generation, like overnight, so I explored using both Codex / Claude and local LLMs when I hit rate limits. The files and the file structure of a Ralph Loop are important enough, that I created a scaffolding cli to generate a template for new projects. I still have some things I want to add to it, so it will route to the service with capacity, when one hits a limit. But it's useful, for understanding, the general pattern.

I've assumed, like with any useful pattern right now, the ralph loop would become productized in some way, and it has. Anthropic released the Ralph Wiggum plugin for Claude Code (which doesn't quite get the pattern right), and then just the other day, OpenAI released a slash command called /goal (and now Anthropic has copied it), which more closely productizes/captures a true ralph loop. I haven't experimented with it yet, but I've watched others use it on YouTube. And, it still requires some knowledge and understanding of the files required, and what's involved in the whole process, to get a good result, and there are already plugins and apps to help with that process. There's a lot of value in understanding the concept, building it yourself. Otherwise you can't reach past the product when it doesn't do what you need. Same, like with OpenClaw – even if it's lost some of its trendiness on X lately, due to some of its core features being productized by Anthropic, it's still extremely useful to understand how it works. If you're an engineer, try building it yourself, configuring it, contributing to the repo. Agents that can do more than chat, are the future. Building and understanding how agents work is, well, Geoff says, it's the new technical baseline for an engineer. And I agree.

Something else I overheard at the event – there were some founders, or "tech celebrities," who get free inference, or access to reduced rate inference from the frontier providers, which to me is indicative of almost a new class divide. Inference is capital. People with access to it can build at scale that others can't. And I'm finding this out now, as Anthropic and to some extent, OpenAI, have continued scaling back unlimited use on their max plans. I'm now unable to run a Ralph Loop overnight with OpenAI or Anthropic Max plans without hitting rate limits. And I wake up to a loop that didn't finish, as a result. There are definitely ways to engineer around that, and local LLMs can supplement. There are open source, and productized agent routing too - I'm seeing a lot of new innovation around solving that problem, met several founders that have solutions for that.

And, even as I'm writing this, Claude/Anthropic announced new limits, a new change to their plans, meant to limit programmatic, or Ralph-Loop–type use of their models and services. They're forcing users to pay for it. Which, on some level, makes a lot of sense. We've been enjoying subsidized inference for a long time, and Anthropic and OpenAI, I think, are finally starting to realize they can't afford it long term, given their current rate of growth, and spend, and capacity limits. But the way Anthropic, in particular, has gone about implementing the changes, has angered a lot of devs, myself included. It's hard to tell how much of that is just reacting to having our free lunches taken away, and how much is coming from the dishonest way Anthropic's gone about it, to the way they've positioned themselves around being designed for engineers, then abandoning us at the drop of a hat.

I discovered another YouTuber recently, Theo, I've been enjoying catching up on his videos. And, among some of the other things he covers, this video was I think really relevant to my point above -- I've been thinking a lot about how the next phase of innovation will be on making the products more efficient. Token usage, latency, data, structure. There's a lot of opportunity for improved engineering around the mechanics between your tool, or product using AI, and the base models, or LLMs. For example, and Theo does a good job of explaining this in his video, the current pattern is to send the entire conversation history over the internet every time you send a questions or any text. And, don't forget, you're charged for those tokens. So, if at the end of your conversation, you tell your chatgpt, "thank you," it sends back "thank you", plus your entire conversation history. There's the cost of latency, given it's going over the internet, and there's a cost of processing that text by the model, among other things. There is some caching that helps with that, but it doesn't mitigate the issue entirely. And it seems like it may not always work, or be used. When chatgpt came out, I wrote a react chat app that used the azure models, and so that's something I've always understood. Again, I think really understanding how these tools work, and being original, starts with building it yourself. Because, there in lies opportunity when you understand the shortcomings.

Theo also explains in the same video, something that I've been thinking about too – that the frontier model providers are hitting capacity issues, which is why they've started cramping everyone's style. And they, and the other players, are going to have to start exploring all the options they have for freeing up capacity to the models. So, there's opportunity there – and in that same video, he reveals that OpenAI recently added websocket support to their Responses API, for their communication over the internet, which is a good thing, and indicates the trend towards efficiency -- it opens up a lot of new options for reducing api calls, reducing the size of the api calls, improved memory etc.

And to that end, I'm also thinking about token usage optimization from other angles - like in the agent instruction files, and the way the loops are designed. Do the agents really need to read through all the files every time? Are there ways to optimize that process? And consider the verbosity of models like Opus 4.7, again – designed for humans. Does another agent need all that explanation? Or can it suffice with terse and direct. Maybe a different language all together.

As an engineer who's written a chatbot, I noticed, and think about this a lot -- have you ever noticed that Claude and ChatGPT, they won't let you delete parts of a chat. Like you have to start over completely. If there were say one or 2 really long responses from Claude that aren't useful, and went the wrong direction (or maybe told you to go to sleep, lol ifykyk) you can't just delete them, and reframe the direction of the convo. ChatGPT did create the threads/branch feature, which addresses some part of that. But yeah, one of the first features I added to my chatbot, was deleting messages in the conversation, at any point. It was especially useful at that time, because the context windows were a lot smaller. But it also occurs to me, it would still be useful now, especially, given how verbose some of the models can be. And that's just one area for improvement.. there are many. The more you build the tools, the more you have to offer to their progress.

The other big takeaway from the Guild was about GitHub – about whether it has a future. And even git and source control all together. A lot of people think source control needs to adapt and/or change or die. That it was designed for humans, not agents that work autonomously for days, and push commits exponentially compared to humans. Github definitely was not made for that, and the infrastructure behind Github has struggled to support the new pace of work, nevermind the whole workflow, UX was designed for humans, to work on code together, to communicate, and collaborate. But will humans be involved anymore? It's an open question, I think. Theo has a video about leaving Github. And there are a lot of founders, doing cool things around source control lately – I haven't played with a lot of them yet, but .. yeah, like jj.

Anyway, the DevGuild event sparked desire to do more events. And I think it's an exciting time for tech and innovation. I'm an optimist on jobs from AI. The City feels different right now. People are buzzing, Startups abound. Grab a rail and hang on tight.

More from me: YouTube · create-ralph-loop on GitHub

Stay Current

Subscribe to get new posts, project updates, and exclusive insights – delivered straight to your inbox. No spam, just stuff you can use.