The Hidden Economics of “Free” AI Tools: Why the SaaS Premium Still Matters

This post discusses the hidden costs of DIY solutions in SaaS, emphasizing the benefits of established SaaS tools over “free” AI-driven alternatives. It highlights issues like time tax, knowledge debt, reliability, support challenges, security risks, and scaling problems. Ultimately, it advocates for a balanced approach that leverages AI to enhance, rather than replace, reliable SaaS infrastructure.

This is Part 2 of my series on the evolution of SaaS. If you haven’t read Part 1: The SaaS Model Isn’t Dead, it’s Evolving Beyond the Hype of “Vibe Coding”, start there for the full context. In this post, I’m diving deeper into the hidden costs that most builders don’t see until it’s too late.

In my last post, I argued that SaaS isn’t dead, it’s just evolving beyond the surface-level appeal of vibe coding. Today, I want to dig deeper into something most builders don’t realize until it’s too late: the hidden costs of “free” AI-powered alternatives.

Because here’s the uncomfortable truth: when you replace a $99/month SaaS tool with a Frankenstein stack of AI prompts, no-code platforms, and API glue, you’re not saving money. You’re just moving the costs somewhere else, usually to places you can’t see until they bite you.

Let’s talk about what really happens when you choose the “cheaper” path.

The Time Tax: When Free Becomes Expensive

Picture this: you’ve built your “MVP” in a weekend. It’s glorious. ChatGPT wrote half the code, Zapier connects your Airtable to your Stripe account, and a Make.com scenario handles email notifications. Total monthly cost? Maybe $20 in API fees.

You’re feeling like a genius.

Then Monday morning hits. A customer reports an error. The Zapier workflow failed silently. You spend two hours digging through logs (when you can find them) only to discover that Airtable changed their API rate limits, and now your automation hits them during peak hours.

You patch it with a delay. Problem solved.

Until Wednesday, when three more edge cases emerge. The Python script you copied from ChatGPT doesn’t handle timezone conversions properly. Your payment flow breaks for international customers. The no-code platform you’re using doesn’t support the webhook format you need.

Each fix takes 30 minutes to 3 hours.

By Friday, you’ve spent more time maintaining your “free” stack than you would have spent just using Stripe Billing and ConvertKit.

This is the time tax. And unlike your SaaS subscription, you can’t expense it or write it off. It’s just gone, stolen from building features, talking to customers, or actually running your business.

The question isn’t whether your DIY solution costs less. It’s whether your time is worth $3/hour.

The Knowledge Debt: Building on Borrowed Understanding

Here’s a scenario that plays out constantly in the AI-first era:

A developer prompts Claude to build a payment integration. The AI generates beautiful code, type-safe, well-structured, handles edge cases. The developer copies it, tests it once, and ships it.

It works perfectly for two months.

Then Stripe deprecates an API endpoint. Or a customer discovers a refund edge case. Or the business wants to add subscription tiers.

Now what?

The developer stares at 200 lines of code they didn’t write and don’t fully understand. They can prompt the AI again, but they don’t know which parts are safe to modify. They don’t know why certain patterns were used. They don’t know what will break.

This is knowledge debt, the accumulated cost of using code you haven’t internalized.

Compare this to using a proper SaaS tool like Stripe Billing or Chargebee. You don’t understand every line of their code either, but you don’t need to. They handle the complexity. They migrate your data when APIs change. They’ve already solved the edge cases.

When you build with barely-understood AI-generated code, you get the worst of both worlds: you’re responsible for maintenance without having the knowledge to maintain it effectively.

This isn’t a knock on AI tools. It’s a reality check about technical debt in disguise.

The Reliability Gap: When “Good Enough” Isn’t

Let’s zoom out and talk about production-grade systems.

When you use Slack, it has 99.99% uptime. That’s not luck, it’s the result of on-call engineers, redundant infrastructure, automated failovers, and millions of dollars in operational excellence.

When you stitch together your own “Slack alternative” using Discord webhooks, Airtable, and a Telegram bot, what’s your uptime?

You don’t even know, because you’re not measuring it.

And here’s the thing: your customers notice.

They notice when notifications arrive 3 hours late because your Zapier task got queued during peak hours. They notice when your checkout flow breaks because you hit your free-tier API limits. They notice when that one Python script running on Replit randomly stops working.

Reliability isn’t a feature you can bolt on later. It’s the foundation everything else is built on.

This is why companies still pay for Datadog instead of writing their own monitoring. Why they use PagerDuty instead of email alerts. Why they choose AWS over running servers in their garage.

Not because they can’t build these things themselves, but because reliability at scale requires obsessive attention to details that don’t show up in MVP prototypes.

Your vibe-coded solution might work 95% of the time. But that missing 5% is where trust dies and customers churn.

The Support Nightmare: Who Do You Call?

Imagine this email from a customer:

“Hi, I tried to upgrade my account but got an error. Can you help?”

Simple enough, right?

Except your “upgrade flow” involves:

  • A Stripe Checkout session (managed by Stripe)
  • A webhook that triggers Make.com (managed by Make.com)
  • Which updates Airtable (managed by Airtable)
  • Which triggers a Zapier workflow (managed by Zapier)
  • Which sends data to your custom API (deployed on Railway)
  • Which updates your database (hosted on PlanetScale)

One of these broke. Which one? You have no idea.

You start debugging:

  • Check Stripe logs. Payment succeeded.
  • Check Make.com execution logs. Ran successfully.
  • Check Airtable. Record updated.
  • Check Zapier. Task queued but not processed yet.

Ah. Zapier’s free tier queues tasks during high-traffic periods. The upgrade won’t process for another 15 minutes.

You explain this to the customer. They’re confused and frustrated. So are you.

Now imagine that same scenario with a proper SaaS tool like Memberstack or MemberSpace. The customer emails them. They check their logs, identify the issue, and fix it. Done.

When you own the entire stack, you own all the problems too. And most founders don’t realize how much time “customer support for your custom infrastructure” actually takes until they’re drowning in it.

The Security Illusion: Compliance Costs You Can’t See

Pop quiz: Is your AI-generated authentication system GDPR compliant?

Does it properly hash passwords? Does it prevent timing attacks? Does it implement proper session management? Does it handle token refresh securely? Does it log security events appropriately?

If you’re not sure, you’ve got a problem.

Because when you use Auth0, Clerk, or AWS Cognito, these questions are answered for you. They have security teams, penetration testers, and compliance certifications. They handle GDPR, CCPA, SOC2, and whatever acronym-soup regulation applies to your industry.

When you roll your own auth with AI-generated code, you own all of that responsibility.

And here’s what most people don’t realize: security incidents are expensive. Not just in terms of fines and legal costs, but in reputation damage and customer trust.

One breach can kill a startup. And saying “but ChatGPT wrote the code” isn’t a legal defense.

The same logic applies to payment handling, data storage, and API security. Every shortcut you take multiplies your risk surface.

SaaS tools don’t just sell features, they sell peace of mind. They carry the liability so you don’t have to.

The Scale Wall: When Growth Breaks Everything

Your vibe-coded MVP works perfectly for your first 10 customers. Then you get featured on Product Hunt.

Suddenly you have 500 new signups in 24 hours.

Your Airtable base hits record limits. Your free-tier API quotas are maxed out. Your Make.com scenarios are queuing tasks for hours. Your Railway instance keeps crashing because you didn’t configure autoscaling. Your webhook endpoints are timing out because they weren’t designed for concurrent requests.

Everything is on fire.

This is the scale wall, the moment when your clever shortcuts stop being clever and start being catastrophic.

Real SaaS products are built to scale. They handle traffic spikes. They have redundancy. They auto-scale infrastructure. They cache aggressively. They optimize database queries. They monitor performance.

Your vibe-coded stack probably does none of these things.

And here’s the brutal part: scaling isn’t something you can retrofit easily. It’s architectural. You can’t just “add more Zapier workflows” your way out of it.

At this point, you face a choice: either rebuild everything properly (which takes months and risks losing customers during the transition), or artificially limit your growth to stay within the constraints of your fragile infrastructure.

Neither option is appealing.

The Integration Trap: When Your Stack Doesn’t Play Nice

One of the biggest promises of the AI-powered, no-code revolution is that everything integrates with everything.

Except it doesn’t. Not really.

Sure, Zapier connects to 5,000+ apps. But those integrations are surface-level. You get basic CRUD operations, not deep functionality.

Want to implement complex business logic? Want custom error handling? Want to batch process data efficiently? Want real-time updates instead of 15-minute polling?

Suddenly you’re writing custom code anyway, except now you’re writing it in the weird constraints of whatever platform you’ve chosen, rather than in a proper application where you have full control.

The irony is thick: you chose no-code to avoid complexity, but you ended up with a different kind of complexity, one that’s harder to debug and impossible to version control properly.

Meanwhile, a well-designed SaaS tool either handles your use case natively or provides a proper API for custom integration. You’re not fighting the platform; you’re using it as intended.

The Real Cost Comparison

Let’s do some actual math.

Vibe-coded stack:

  • Zapier Pro: $20/month
  • Make.com: $15/month
  • Airtable Pro: $20/month
  • Railway: $10/month
  • Various API costs: $15/month
  • Total: $80/month

Your time:

  • Initial setup: 20 hours
  • Weekly maintenance: 3 hours
  • Monthly debugging: 5 hours
  • Customer support for stack issues: 2 hours
  • Monthly time cost: ~20 hours

If your time is worth even $50/hour (a modest rate for a technical founder), that’s $1,000/month in opportunity cost.

Total real cost: $1,080/month.

Proper SaaS stack:

  • Stripe Billing: Included with processing fees
  • Memberstack: $25/month
  • ConvertKit: $29/month
  • Vercel: $20/month
  • Total: $74/month + processing fees

Your time:

  • Initial setup: 4 hours
  • Weekly maintenance: 0.5 hours
  • Monthly debugging: 1 hour
  • Customer support for stack issues: 0 hours (vendor handles it)
  • Monthly time cost: ~3 hours

At $50/hour, that’s $150/month in opportunity cost.

Total real cost: $224/month.

The “more expensive” SaaS stack actually costs 80% less when you account for time.

And we haven’t even factored in:

  • The revenue lost from downtime
  • The customers lost from poor reliability
  • The scaling issues you’ll hit later
  • The security risks you’re accepting
  • The knowledge debt you’re accumulating

When DIY Makes Sense (And When It Doesn’t)

Look, I’m not saying you should never build anything custom. There are absolutely times when DIY is the right choice.

Build custom when:

  • The functionality is core to your competitive advantage
  • No existing tool solves your exact problem
  • You have the expertise to maintain it long-term
  • You’re building something genuinely novel
  • You have the team capacity to own it forever

Use SaaS when:

  • The functionality is commodity (auth, payments, email, etc.)
  • Reliability and uptime are critical
  • You want to focus on your core product
  • You’re a small team with limited time
  • You need compliance and security guarantees
  • You value your time more than monthly fees

The pattern is simple: build what makes you unique, buy what makes you functional.

The AI-Assisted Middle Ground

Here’s where it gets interesting: AI doesn’t just enable vibe coding. It also enables smarter SaaS integration.

You can use Claude or ChatGPT to:

  • Generate integration code for SaaS APIs faster
  • Debug webhook issues more efficiently
  • Build wrapper libraries around vendor SDKs
  • Create custom workflows on top of stable platforms

This is the sweet spot: using AI to accelerate your work with reliable tools, rather than using AI to replace reliable tools entirely.

Think of it like this: AI is an incredible co-pilot. But you still need the plane to have wings.

The Evolution Continues

My argument isn’t that AI tools are bad or that vibe coding is wrong. It’s that we need to be honest about the tradeoffs.

The next generation of successful products won’t be built by people who reject AI, and they won’t be built by people who reject SaaS.

They’ll be built by people who understand when to use each.

People who can vibe-code a prototype in a weekend, then have the discipline to replace it with proper infrastructure before it scales. People who use AI to augment their capabilities, not replace their judgment.

The future isn’t “AI vs. SaaS.” It’s “AI-enhanced SaaS.”

Tools that are easier to integrate because AI helps you. APIs that are easier to understand because AI explains them. Systems that are easier to maintain because AI helps you debug.

But beneath all that AI magic, there’s still reliable infrastructure, accountable teams, and boring old uptime guarantees.

Because at the end of the day, customers don’t care about your tech stack. They care that your product works when they need it.

Build for the Long Game

If you’re building something that matters, something you want customers to depend on, something you want to grow into a real business, you need to think beyond the MVP phase.

You need to think about what happens when you hit 100 users. Then 1,000. Then 10,000.

Will your clever weekend hack still work? Or will you be spending all your time keeping the lights on instead of building new features?

The most successful founders I know aren’t the ones who move fastest. They’re the ones who move sustainably, who build foundations that can support growth without collapsing.

They use AI to move faster. They use SaaS to stay reliable. They understand that both are tools, not religions.

Final Thoughts: Respect the Craft

There’s a romance to the idea of building everything yourself. Of being the 10x developer who needs nothing but an AI assistant and pure willpower.

But romance doesn’t ship products. Discipline does.

The best software is invisible. It just works. And making something “just work”, consistently, reliably, at scale, is harder than anyone admits.

So use AI. Vibe-code your prototypes. Move fast and experiment.

But when it’s time to ship, when it’s time to serve real customers, when it’s time to build something that lasts, respect the craft.

Choose boring, reliable infrastructure. Pay for the SaaS tools that solve solved problems. Invest in quality over cleverness.

Because the goal isn’t to build the most innovative tech stack.

The goal is to build something customers love and trust.

And trust, as it turns out, is built on the boring stuff. The stuff that works when you’re not looking. The stuff that scales without breaking. The stuff someone else maintains at 3 AM so you don’t have to.

That’s what SaaS really sells.

And that’s why it’s not dead, it’s just getting started.


What’s your experience balancing custom-built solutions with SaaS tools? Have you hit the scale wall or the reliability gap? Share your stories in the comments. I’d love to hear what you’ve learned.

If you found this useful, follow me for more posts on building sustainable products in the age of AI, where we embrace new tools without forgetting old wisdom.

Rails Templating Showdown: Slim vs ERB vs Haml vs Phlex – Which One Should You Use?

This guide compares Ruby on Rails templating engines: ERB, Slim, Haml, and Phlex. It highlights each engine’s pros and cons, focusing on aspects like performance, readability, and learning curve. Recommendations are made based on project type, emphasizing the importance of choosing the right engine for optimal efficiency and maintainability.

If you’ve been working with Ruby on Rails for any length of time, you’ve probably encountered the age-old question: which templating engine should I use? With ERB as the default, Slim and Haml as popular alternatives, and Phlex as the new kid on the block, the choice can feel overwhelming.

In this comprehensive guide, I’ll break down each option, compare their strengths and weaknesses, and help you make an informed decision for your Rails projects.

Understanding the Landscape

Before diving into specifics, let’s understand what we’re comparing. Template engines are tools that help you generate HTML dynamically by embedding Ruby code within markup. Each engine has a different philosophy about how this should be done.

ERB (Embedded Ruby)

What is it? ERB is Rails’ default templating engine. It embeds Ruby code directly into HTML using special tags.

Syntax Example

<div class="user-profile">
  <h1><%= @user.name %></h1>
  <% if @user.admin? %>
    <span class="badge">Admin</span>
  <% end %>
  <ul class="posts">
    <% @user.posts.each do |post| %>
      <li><%= link_to post.title, post_path(post) %></li>
    <% end %>
  </ul>
</div>
Code language: HTML, XML (xml)

Pros

Zero Learning Curve: If you know HTML and Ruby, you already know ERB. There’s no new syntax to learn, making it perfect for beginners and mixed teams.

Universal Support: Every Rails developer knows ERB. Every gem, tutorial, and Stack Overflow answer uses ERB. This ubiquity is valuable.

No Setup Required: It works out of the box with every Rails installation. No gems to add, no configuration needed.

Familiar to Other Ecosystems: The concept of embedding code in angle brackets exists in PHP, ASP, JSP, and many other frameworks. Developers coming from other backgrounds will feel at home.

Cons

Verbose: Writing closing tags for everything gets tedious. Your files become longer than they need to be.

Easy to Create Messy Code: Because ERB doesn’t enforce structure, it’s easy to mix business logic with presentation logic, leading to hard-to-maintain views.

Repetitive: You’ll find yourself typing the same patterns over and over. The lack of shortcuts makes ERB feel inefficient once you’ve experienced alternatives.

When to Use ERB

ERB is ideal when you’re starting a new project with junior developers, working with a team that values convention over optimization, or building simple CRUD applications where template complexity is minimal. It’s also the safe choice for open-source projects where maximum accessibility matters.

Slim

What is it? Slim is a lightweight templating engine focused on reducing syntax to its bare essentials. Its motto is “what’s left when you take the fat off ERB.”

Syntax Example

.user-profile
  h1 = @user.name
  - if @user.admin?
    span.badge Admin
  ul.posts
    - @user.posts.each do |post|
      li = link_to post.title, post_path(post)
Code language: JavaScript (javascript)

Pros

Dramatically Less Code: Slim templates are typically 30-40% shorter than their ERB equivalents. This means faster writing and easier scanning.

Clean and Readable: Once you learn the syntax, Slim templates are remarkably easy to read. The indentation-based structure naturally enforces good organization.

Fast Performance: Slim compiles to Ruby code that’s often faster than ERB, though the difference is negligible in most applications.

Enforces Good Structure: The indentation requirement prevents messy, unstructured code. You can’t create a Slim template that doesn’t follow proper nesting.

Cons

Learning Curve: Team members need to learn new syntax. The first week will involve frequent reference to documentation.

Indentation Sensitivity: Like Python, Slim uses significant whitespace. A misplaced space or tab can break your template, which can be frustrating when debugging.

Less Common: Fewer developers know Slim compared to ERB. Hiring and onboarding may take slightly longer.

Limited Ecosystem Examples: While most gems work fine with Slim, documentation and examples are usually in ERB, requiring mental translation.

When to Use Slim

Slim shines in applications with complex views where you want to maximize readability and minimize boilerplate. It’s perfect for teams that value developer experience and are willing to invest a small amount of time upfront to learn the syntax. If you find yourself frustrated by ERB’s verbosity, Slim is your answer.

Haml

What is it? Haml (HTML Abstraction Markup Language) was one of the first popular alternatives to ERB. It uses indentation to represent HTML structure and eliminates closing tags.

Syntax Example

.user-profile
  %h1= @user.name
  - if @user.admin?
    %span.badge Admin
  %ul.posts
    - @user.posts.each do |post|
      %li= link_to post.title, post_path(post)
Code language: JavaScript (javascript)

Pros

Mature and Stable: Haml has been around since 2006. It’s battle-tested and reliable with excellent documentation.

Cleaner Than ERB: Like Slim, Haml eliminates closing tags and reduces boilerplate significantly.

Good Ecosystem Support: Many gems and libraries explicitly support Haml, and you’ll find plenty of examples and resources online.

Enforces Structure: The indentation requirement keeps your code organized and prevents deeply nested chaos.

Cons

Slower Than Slim: Haml is noticeably slower than Slim in benchmarks, though for most applications this won’t matter.

More Verbose Than Slim: The % prefix for tags makes Haml slightly more verbose than Slim’s minimalist approach.

Indentation Sensitivity: Like Slim, whitespace matters. Mixing tabs and spaces will cause problems.

Feeling Dated: While still widely used, Haml hasn’t evolved as quickly as Slim. It lacks some of the refinements that make Slim feel more modern.

When to Use Haml

Choose Haml if you want an alternative to ERB but prefer a more established option with extensive community support. It’s a safe middle ground between ERB’s verbosity and Slim’s minimalism. Haml is particularly good if you’re maintaining a legacy codebase that already uses it.

Phlex

What is it? Phlex represents a radical departure from traditional templating. Instead of mixing Ruby with HTML-like syntax, Phlex uses pure Ruby classes to build views. It’s component-oriented and type-safe.

Syntax Example

class UserProfile < Phlex::HTML
  def initialize(user)
    @user = user
  end

  def template
    div(class: "user-profile") do
      h1 { @user.name }
      span(class: "badge") { "Admin" } if @user.admin?
      ul(class: "posts") do
        @user.posts.each do |post|
          li { a(href: post_path(post)) { post.title } }
        end
      end
    end
  end
end
Code language: HTML, XML (xml)

Pros

Pure Ruby: No context switching between Ruby and template syntax. Your entire view is just Ruby code, which means better IDE support, easier refactoring, and familiar debugging.

Component Architecture: Phlex encourages building reusable components, leading to better code organization and DRY principles.

Type Safety: Because it’s pure Ruby, you can use tools like Sorbet or RBS for type checking your views.

Excellent Performance: Phlex is extremely fast, often outperforming other template engines significantly.

Testable: Components are just Ruby classes, making them easy to unit test without rendering overhead.

No Markup Parsing: Since there’s no template syntax to parse, there’s one less layer of complexity in your stack.

Cons

Paradigm Shift: Phlex requires a completely different way of thinking about views. This isn’t just new syntax—it’s a new architecture.

Verbose for Simple Views: For basic templates, Phlex can feel like overkill. Writing div { h1 { "Hello" } } instead of <div><h1>Hello</h1></div> doesn’t feel like progress for simple cases.

Limited Ecosystem: Phlex is new. There are fewer examples, fewer ready-made components, and a smaller community.

No Designer-Friendly Workflow: Because Phlex is pure Ruby, front-end developers or designers who aren’t comfortable with Ruby will struggle to contribute to views.

Steep Learning Curve: Understanding how to structure Phlex components well takes time and experience.

When to Use Phlex

Phlex is ideal for component-heavy applications where you want maximum reusability and testability. It’s perfect for design systems, UI libraries, or applications with complex, interactive interfaces. Choose Phlex if your team is comfortable with Ruby and values type safety and performance. It’s also excellent for API-driven applications where you’re building JSON responses rather than full HTML pages.

The Comparison Matrix

Let me break down how these engines stack up across key criteria:

Performance

Winner: Phlex Phlex is the fastest, followed closely by Slim. Haml is slower, and ERB sits in the middle. However, for most applications, template rendering isn’t the bottleneck—database queries and business logic are.

Readability

Winner: Slim Once learned, Slim offers the best balance of conciseness and clarity. ERB is readable but verbose. Haml is good but slightly cluttered with % symbols. Phlex requires Ruby fluency to read comfortably.

Learning Curve

Winner: ERB ERB has virtually no learning curve. Slim and Haml require a day or two to feel comfortable. Phlex requires rethinking your entire approach to views.

Ecosystem Support

Winner: ERB ERB is universal. Everything supports it. Slim and Haml have good support but sometimes require translation. Phlex is still building its ecosystem.

Maintainability

Winner: Phlex/Slim Phlex’s component architecture and Slim’s enforced structure both lead to highly maintainable codebases. ERB’s flexibility can become a maintainability liability. Haml sits in the middle.

Team Onboarding

Winner: ERB Any Rails developer can contribute to ERB templates immediately. The alternatives require training time.

My Recommendations

After years of using all these engines in production, here’s what I recommend:

For New Projects with Small Teams

Use Slim. You’ll write less code, maintain cleaner views, and the learning investment pays off quickly. The performance gains are nice, but the real benefit is how much easier it is to scan and understand Slim templates.

For Large Teams or Open Source

Stick with ERB. The universal knowledge and zero onboarding friction outweigh the benefits of alternatives. Don’t underestimate the value of every Rails developer being able to contribute immediately.

For Component-Heavy Applications

Choose Phlex. If you’re building a complex UI with lots of reusable components, Phlex’s architecture will save you time in the long run. The learning curve is worth it for applications where component composition is central.

For Existing Projects

Don’t Rewrite. If your project already uses Haml or Slim, keep using it. If it uses ERB and you’re happy with it, don’t change. The cost of conversion rarely justifies the benefits.

For Learning

Start with ERB, then try Slim. Master Rails with its default templating engine first. Once you’re comfortable, experiment with Slim on a side project. After you understand the tradeoffs, you’ll be equipped to make informed decisions.

Mixing Engines

Here’s something many developers don’t realize: you can use multiple templating engines in the same Rails application. You might use ERB for most views but Phlex for a complex component or Slim for your admin interface.

This flexibility means you’re not locked into one choice forever. Start with ERB and migrate specific areas to alternatives as needs arise.

The Future

The Rails templating landscape is evolving. Phlex represents a new wave of thinking about views as components rather than templates. Meanwhile, tools like ViewComponent bridge the gap between traditional templates and component architecture.

My prediction? We’ll see more hybrid approaches where simple CRUD views use traditional templates while complex UIs leverage component-based systems like Phlex.

Conclusion

There’s no universally correct answer to “which templating engine should I use?” The right choice depends on your team, your project, and your priorities.

  • ERB for maximum compatibility and zero friction
  • Slim for optimal developer experience and clean code
  • Haml for a mature alternative with good ecosystem support
  • Phlex for component-driven architecture and maximum performance

My personal preference? I use Slim for most projects. The productivity boost is real, the syntax becomes second nature quickly, and I appreciate how it naturally encourages better code organization. But I’ve shipped successful applications with all four engines, and I wouldn’t hesitate to use any of them given the right context.

What matters most isn’t which engine you choose, but that you use it consistently and well. A well-structured ERB codebase beats a messy Slim project every time.

What’s your experience with Rails templating engines? Have you tried alternatives to ERB? I’d love to hear your thoughts in the comments below.


Want to dive deeper into Rails development? Subscribe to my newsletter for weekly tips and insights on building better Rails applications.

Why AI Startups Should Choose Rails Over Python

AI startups often fail due to challenges in supporting layers and product development rather than model quality. Rails offers a fast and structured path for founders to build scalable applications, integrating seamlessly with AI services. While Python excels in research, Rails is favored for production, facilitating swift feature implementation and reliable infrastructure.

TLDR;

Most AI startups fail because they cannot ship a product
not because the model is not good enough
Rails gives founders the fastest path from idea to revenue
Python is still essential for research but Rails wins when the goal is to build a business.

The Real Challenge in AI Today

People love talking about models
benchmarks
training runs
tokens
context windows
all the shiny parts

But none of this is why AI startups fail

Startups fail because the supporting layers around the model are too slow to build:

  • Onboarding systems
  • Billing and subscription logic
  • Admin dashboards
  • User management
  • Customer support tools
  • Background processing
  • Iterating on new features
  • Fixing bugs
  • Maintaining stability

The model is never the bottleneck
The product is
This is exactly where Rails becomes your unfair advantage

Why Rails Gives AI Startups Real Speed

Rails focuses on shipping
It gives you a complete system on day one
The framework removes most of the decisions that slow down small teams
Instead of assembling ten libraries you just start building

The result is simple
A solo founder or a tiny team can move with the speed of a full engineering department,
Everything feels predictable,
Everything fits together,
Everything works the moment you touch it.

Python gives you freedom,
Rails gives you momentum,
Momentum is what gets a startup off the ground.

Rails and AI Work Together Better Than Most People Think

There is a common myth that AI means Python
Only partially true
Python is the best language for training and experimenting
But the moment you are building a feature for real users you need a framework that is designed for production

Rails integrates easily with every useful AI service:

  • OpenAI
  • Anthropic
  • Perplexity
  • Groq
  • Nvidia
  • Mistral
  • Any vector database
  • Any embedding store

Rails makes AI orchestration simple
Sidekiq handles background jobs
Active Job gives structure
Streaming responses work naturally
You can build an AI agent inside a Rails app without hacking your way through a forest of scripts

The truth is that you do not need Python to run AI in production
You only need Python if you plan to become a research lab
Most founders never will

Rails Forces You to Think in Systems

AI projects built in Python often turn into a stack of disconnected scripts
One script imports the next
Another script cleans up data
Another runs an embedding job
This continues until even the founder has no idea what the system actually does

Rails solves this by design
It introduces structure: Controllers, Services, Models, Jobs, Events
It forces you to think in terms of a real application rather than a set of experiments

This shift is a superpower for founders
AI is moving from research to production
Production demands structure
Rails gives you structure without slowing you down

Why Solo Founders Thrive With Rails

When you are building alone you cannot afford chaos
You need to create a system that your future self can maintain
Rails gives you everything that normally requires a team

You can add authentication in a few minutes
You can build a clean admin interface in a single afternoon
You can create background workflows without debugging weird timeouts
You can send emails without configuring a jungle of libraries
You can go from idea to working feature in the same day

This is what every founder actually needs
Not experiments
Not scripts
A product that feels real
A product you can ship this week
A product you can charge money for

Rails gives you that reality


Real Companies That Prove Rails Is Still a Winning Choice

Rails is not a nostalgia framework
It is the foundation behind some of the biggest products ever created

GitHub started with Rails
Shopify grew massive on Rails
Airbnb used Rails in its early explosive phase
Hulu
Zendesk
Basecamp
Dribbble
All Rails

Modern AI driven companies also use Rails in production
Shopify uses it to power AI commerce features
Intercom uses it to support AI customer support workflows
GitHub still relies on Rails for internal systems even as it builds Copilot
Stripe uses Rails for internal tools because the Python stack is too slow for building complex dashboards

These are not lightweight toy projects
These are serious companies that trust Rails because it just works

What You Gain When You Choose Rails

The biggest advantage is development speed
Not theoretical speed
Real speed
The kind that lets you finish an entire feature before dinner

Second
You escape the burden of endless decisions
The framework already gives you the right defaults
You do not waste time choosing from twenty possible libraries for each part of the system

Third
Rails was built for production
This matters more than people admit
You get caching, background jobs, templates, email, tests, routing, security, all included, all consistent, all reliable

Fourth
Rails fits perfectly with modern AI infrastructure: Vector stores, embedding workflows, agent orchestration, streaming responses. It works out of the box with almost no friction

This combination is rare
Rails gives you speed and stability at the same time
Most frameworks give you one or the other
Rails gives you both

Where Rails Is Not the Best Too

There are honest limits. If you are training models working with massive research datasets, writing CUDA kernels or doing deep ML research Python remains the right choice.

If you come from Python the Rails conventions can feel magical or strange at first You might wonder why things happen automatically. But the conventions are there to help you move faster

Hiring can be more challenging in certain regions
There are fewer Rails developers
but the ones you find are usually very strong
and often much more experienced in building actual products

You might also deal with some bias. A few people still assume Rails is old
These people are usually too young to remember that Rails built half the modern internet

The One Thing Every Founder Must Understand

The future of AI will not be won by better models. Models are quickly becoming a commodity. The real victory will go to the teams that build the best products around the models:

  • Onboarding
  • UX
  • speed
  • reliability
  • iteration
  • support
  • support tools
  • customer insights
  • monetization
  • all the invisible details that turn a clever idea into a real business

Rails is the best framework in the world for building these supporting layers fast. This is why it remains one of the most effective choices for early stage AI startups

Use Python for research
Use Rails to build the business
This is the strategy that gives you the highest chance of reaching customers
and more importantly
the highest chance of winning

The AI-Native Rails App: What a 2025 Architecture Looks Like

Introduction

For the first time in decades of building products, I’m seeing a shift that feels bigger than mobile or cloud.
AI-native architecture isn’t “AI added into the app” it’s the app shaped around AI from day one.

In this new world:

  • Rails is no longer the main intelligence layer
  • Rails becomes the orchestrator
  • The AI systems do the thinking
  • The Rails app enforces structure, rules, and grounding

And honestly? Rails has never felt more relevant than in 2025.

In this post, I’m breaking down exactly what an AI-native Rails architecture looks like today, why it matters, and how to build it with real, founder-level examples from practical product work.

1. AI-Native Rails vs. AI-Powered Rails

Many apps today use AI like this:

User enters text → you send it to OpenAI → you show the result

That’s not AI-native.
That’s “LLM glued onto a CRUD app.”

AI-native means:

  • Your DB supports vector search
  • Your UI expects streaming
  • Your workflows assume LLM latency
  • Your logic expects probabilistic answers
  • Your system orchestrates multi-step reasoning
  • Your workers coordinate long-running tasks
  • Your app is built around contextual knowledge, not just forms
A 2025 AI-native Rails stack looks like this:
  • Rails 7/8
  • Hotwire (Turbo + Stimulus)
  • Sidekiq or Solid Queue
  • Postgres with PgVector
  • OpenAI, Anthropic, or Groq APIs
  • Langchain.rb for tooling and structure
  • ActionCable for token-by-token streaming
  • Comprehensive logging and observability

This is the difference between a toy and a business.

2. Rails as the AI Orchestrator

AI-native architecture can be summarized in one sentence:

Rails handles the constraints, AI handles the uncertainty.

Rails does:

  • validation
  • data retrieval
  • vector search
  • chain orchestration
  • rule enforcement
  • tool routing
  • background workflows
  • streaming to UI
  • cost tracking

The AI does:

  • reasoning
  • summarization
  • problem-solving
  • planning
  • generating drafts
  • interpreting ambiguous input

In an AI-native system:

Rails is the conductor. The AI is the orchestra.

3. Real Example: AI Customer Support for Ecommerce

Most ecommerce AI support systems are fragile:

  • they hallucinate answers
  • they guess policies
  • they misquote data
  • they forget context

An AI-native Rails solution works very differently.

Step 1: User submits a question

A Turbo Frame or Turbo Stream posts to:

POST /support_queries

Rails saves:

  • user
  • question
  • metadata

Step 2: Rails triggers two workers

(1) EmbeddingJob
– Create embeddings via OpenAI
– Save vector into PgVector column

(2) AnswerGenerationJob
– Perform similarity search on:

  1. product catalog
  2. order history
  3. return policies
  4. previous chats
  5. FAQ rules
    – Pass retrieved context into LLM
    – Validate JSON output
    – Store reasoning steps (optional)

Step 3: Stream the answer

ActionCable + Turbo Streams push tokens as they arrive.

broadcast_append_to "support_chat_#{id}"Code language: JavaScript (javascript)

The user sees the answer appear live, like a human typing.

Why this architecture matters for founders

  • Accuracy skyrockets with grounding
  • Cost drops because vector search reduces tokens
  • Hallucinations fall due to enforced structure
  • You can audit the exact context used
  • UX improves dramatically with streaming
  • Support cost decreases 50–70% in real deployments

This isn’t AI chat inside Rails.

This is AI replacing Tier-1 support, with Rails as the backbone of the system.

4. Example: Founder Tools for Strategy, Decks, and Roadmaps

Imagine building a platform where founders upload:

  • pitch decks
  • PDFs
  • investor emails
  • spreadsheets
  • competitor research
  • user feedback
  • product specs

Old SaaS approach:
You let GPT speculate.

AI-native approach:
You let GPT reason using real company documents.

How it works

Step 1: Upload documents

Rails converts PDFs → text → chunks → embeddings.

Step 2: Store a knowledge graph

PgVector stores embeddings.
Metadata connects insights.

Step 3: Rails defines structure

Rails enforces:

  • schemas
  • output formats
  • business rules
  • agent constraints
  • allowed tools
  • validation filters

Step 4: Langchain.rb orchestrates the reasoning

But Rails sets the boundaries.
The AI stays inside the rails (pun intended).

Step 5: Turbo Streams show ongoing progress

Founders see:

  • “Extracting insights…”
  • “Analyzing competitors…”
  • “Summarizing risks…”
  • “Drafting roadmap…”

This builds trust and increases perceived value.

5. Technical Breakdown: What You Need to Build

Below is the exact architecture I recommend.

1. Rails + Hotwire Frontend

Turbo Streams = real-time AI experience.

  • Streams for token output
  • Frames for async updates
  • No need for React overhead

2. PgVector for AI Memory

Install extension + migration.

Example schema:

create_table :documents do |t|
  t.text :content
  t.vector :embedding, limit: 1536
  t.timestamps
end
Code language: JavaScript (javascript)

Vectors become queryable like any column.

3. Sidekiq or Solid Queue for AI Orchestration

LLM calls must never run in controllers.

Recommended jobs:

  • EmbeddingJob
  • ChunkingJob
  • RetrievalJob
  • LLMQueryJob
  • GroundedAnswerJob
  • AgentWorkflowJob

4. AI Services Layer

Lightweight Ruby service objects.

Embedding example:

class Embeddings::Create
  def call(text)
    OpenAI::Client.new.embeddings(
      model: "text-embedding-3-large",
      input: text
    )["data"][0]["embedding"]
  end
endCode language: CSS (css)

5. Retrieval Layer

Document.order(Arel.sql("embedding <-> '#{embedding}' ASC")).limit(5)Code language: HTML, XML (xml)

Grounding prevents hallucinations and cuts costs.

6. Streaming with ActionCable

Token streaming UX looks magical and retains users.

7. Observability Layer (Non-Optional)

Track:

  • prompts
  • model
  • cost
  • context chunks
  • errors
  • retries
  • latency

AI systems break differently than traditional code.
Logging is survival.


6. How To Start Building This (Exact Steps)

Here’s the fast-track setup:

Step 1: Enable PgVector

Install and migrate.

Step 2: Build an Embedding Service

Clean, testable, pure Ruby.

Step 3: Add Worker Pipeline

One worker per step.
No logic inside controllers.

Step 4: Create Retrieval Functions

Structured context retrieval before every LLM call.

Step 5: Build Token Streaming

Turbo Streams + ActionCable.

Step 6: Add Prompt Templates & A/B Testing

Prompt engineering is your new growth lever.

7. Why Rails Wins the AI Era

AI products are:

  • async
  • slow
  • streaming-heavy
  • stateful
  • data-driven
  • orchestration heavy
  • context dependent

Rails was made for this style of work.

Python builds models.
Rails builds businesses.

We are entering an era where:

Rails becomes the best framework in the world for shipping AI-powered products fast.

And I’m betting on it again like I did 15 years ago but with even more conviction.

Closing Thoughts

Your product is no longer a set of forms.
In the AI era, your product is:

  • memory
  • context
  • retrieval
  • reasoning
  • workflows
  • streaming interfaces
  • orchestration

Rails is the perfect orchestrator for all of it.

The Two Hardest Problems in Software Development: Naming Things & Cache Invalidation

The post discusses the common struggles developers face with naming conventions and cache invalidation, humorously portraying them as universal challenges irrespective of experience or technology. It emphasizes that while AI and Ruby tools assist in these areas, the inherent complexities require human reasoning. Ultimately, these issues highlight the uniquely human aspects of software development.

A joke. A reality. A shared developer trauma now with AI and Ruby flavor.

Every industry has its running jokes. Lawyers have billable hours. Doctors have unreadable handwriting. Accountants battle ancient spreadsheets.

Developers?
We have two immortal bosses at the end of every level:

1. Naming things
2. Cache invalidation

These aren’t just memes.
They’re universal rites of passage, the kind of problems that don’t care about your stack, your years of experience, or your productivity plans for the day.

And as modern as our tools get, these two battles remain undefeated.


1. Naming Things: A Daily Existential Crisis

You would think building multi-region distributed systems or designing production-grade blockchains is harder than deciding how to name a method.

But no.
Naming is where confidence goes to die.

There’s something profoundly humbling about:

  • Typing data, deleting it,
  • typing result, deleting it,
  • typing payload, staring at it, deleting it again,
  • then settling on something like final_sanitized_output and hoping future-you understands the intention.

Naming = Thinking

A name isn’t just a word.
It’s a miniature problem statement.

A good name answers:

  • What is this?
  • Why does it exist?
  • What is it supposed to do?
  • Is it allowed to change?
  • Should anyone else touch it?

A bad name answers none of that but invites everyone on your team to ping you on Slack at 22:00 asking “hey what does temp2 mean?”

Not being a native English speaker? Welcome to the Hard Mode DLC

For those of us whose brain grew up in Croatian or Slovenian, naming in English is a special kind of fun.

You might know exactly what you want to say in your own language, but English gives you:

  • three misleading synonyms,
  • one obscure word nobody uses,
  • and a fourth option that feels right but actually means “a small marine snail.”

Sometimes you choose a word that sounds good.
Later a native speaker reviews it and politely suggests:
“Did you mean something else?”

Yes.
I meant something else.
I just didn’t know the word.

And every developer from Europe, Asia, or South America collectively understands this pain.


2. Cache Invalidation: “Why is my app showing data from last week?”

Caching seems easy on paper:

Save expensive data → Serve it fast → Refresh it when needed.

Unfortunately, “when needed” is where the nightmares begin.

Cache invalidation is unpredictable because it lives at the intersection of:

  • time
  • state
  • concurrency
  • user behavior
  • background jobs
  • frameworks
  • deployment pipelines
  • the moon cycle
  • your personal sins

You delete the key.
The stale value still appears.
You restart the server.
It refuses to die.

You clear your browser cache.
Nothing changes.

Then you realize:
Ah. It’s Cloudflare.
Or Redis.
Or Rails fragment caching.
Or your CDN.
Or… you know what, it doesn’t matter anymore. You’re in too deep.


3. “But can’t AI fix it?”

Not… really.
Not even close.

Large language models can:

  • produce suggestions,
  • generate name variations,
  • summarize logic,
  • help brainstorm alternative wording.

But they don’t actually understand your domain, your codebase, your long-term architecture, or your internal conventions.

Their naming suggestions are based on statistical patterns in text not on:

  • business logic
  • your future plans
  • subtle behavior differences
  • what will still make sense in six months
  • what your teammates expect
  • what your product owner actually wants

AI might suggest a “good enough” name that reads nicely,
but it won’t know that half your system expects a value to mean something slightly different, or that “order” conflicts with another concept named “Order” in a separate context.

And with cache invalidation?
AI can generate explanations but it can’t magically deduce your system’s lifetime, caching layers, or deployment quirks.
It cannot predict race conditions or magically detect all the hidden layers where stale data might be hiding like a gremlin.

AI helps you write code faster.
But it does not remove the need for deep understanding, consistent thinking, and human judgment.


4. How Ruby Tries to Save Us From Ourselves

Ruby and Ruby on Rails in particular has spent two decades trying to soften the blow of both naming and system complexity.

Not by solving the problem completely, but by making the playground safer.

Ruby’s Naming Conventions = Guardrails for Humans

Ruby tries to push developers toward sanity through:

  • clear method naming idioms (predicate?, bang!, _before_type_cast)
  • consistent pluralization rules
  • convention-driven file names and classes
  • ActiveRecord naming patterns like Userusers, Personpeople

Rails developers don’t choose how to name directories, controllers, helpers, or models.
Rails chooses for you.

This is not a limitation it’s freedom.

The fewer decisions you have to make about structure, the more mental energy you save for meaningful names, not framework boilerplate.

Ruby Reduces the “Naming Chaos Budget”

Thanks to convention-over-configuration:

  • folders behave predictably
  • classes match filenames
  • methods follow community patterns
  • model names map directly to database entities
  • you don’t spend half your day wondering where things live

Ruby doesn’t fix naming.
It simply reduces the size of the battlefield.

Ruby Also Softens Caching Pain… a Bit

Rails gives you:

  • fragment caching
  • Russian doll caching
  • cache keys with versioning (cache_key_with_version)
  • automatic key invalidation via ActiveRecord touch
  • expiry helpers (expires_in, expires_at)
  • per-request cache stores
  • caching tied to view rendering

Rails tries to help you avoid stale data by structuring caching around data freshness instead of low-level keys.

But even then…

The moment you have:

  • multiple services,
  • background jobs,
  • external APIs,
  • or anything distributed…

Ruby smiles kindly and whispers:
“You’re on your own now, my friend.”


Why These Problems Never Disappear

Because they aren’t technical problems.
They are human constraints on top of technical systems.

  • Naming requires clarity of thinking.
  • Caching requires clarity of system behavior.
  • AI can assist, but it cannot replace understanding.
  • Ruby can guide you, but it cannot decide for you.

The tools help.
The frameworks help.
AI helps.

But at the end of the day, the two hardest problems remain hard because they require the one thing no machine or framework can automate:

your own reasoning.


Final Thoughts

Some days you ship incredible features.
Some days you wage war against a variable name.
Some days you fight stale cached data for three hours before realizing the problem was a CDN rule from 2018.

And every developer, everywhere on Earth, understands these moments.

Naming things and cache invalidation aren’t just computer science problems.
They’re reminders of why software development is deeply human full of ambiguity, creativity, and shared misery.

But honestly?

That’s what keeps it fun.

PgVector for AI Memory in Production Applications

PgVector is a PostgreSQL extension designed to enhance memory in AI applications by storing and querying vector embeddings. This enables large language models (LLMs) to retrieve accurate information, personalize responses, and reduce hallucinations. PgVector’s efficient indexing and simple integration provide a reliable foundation for AI memory, making it essential for developers building AI products.

Introduction

As AI moves from experimentation into real products, one challenge appears over and over again: memory. Large language models (LLMs) are incredibly capable, but they can’t store long-term knowledge about users or applications out-of-the-box. They respond only to what they see in the prompt and once the prompt ends, the memory disappears.

This is where vector databases and especially PgVector step in.

PgVector is a PostgreSQL extension that adds first-class vector similarity search to a database you probably already use. With its rise in popularity especially in production AI systems it has become one of the simplest and most powerful ways to build AI memory.

This post is a deep dive into PgVector, how it works, why it matters, and how to implement it properly for real LLM-powered features.


What Is PgVector?

PgVector is an open-source PostgreSQL extension that adds support for storing and querying vector data types. These vectors represent high‑dimensional numerical representations embeddings generated from AI models.

Examples:

  • A sentence embedding from OpenAI might be a vector of 1,536 floating‑point numbers.
  • An image embedding from CLIP might be 512 or 768 numbers.
  • A user profile embedding might be custom‑generated from your own model.

PgVector lets you:

  • Store these vectors
  • Index them efficiently
  • Query them using similarity search (cosine, inner product, Euclidean)

This enables your LLM applications to:

  • Retrieve knowledge
  • Add persistent memory
  • Reduce hallucinations
  • Add personalization or context
  • Build recommendation engines

And all of that without adding a new complex piece of infrastructure because it works inside PostgreSQL.


How PgVector Works

At its core, PgVector introduces a new column type:

vector(1536)

You decide the dimension based on your embedding model. PgVector then stores the vector and allows efficient search using:

  • Cosine distance (1 – cosine similarity)
  • Inner product
  • Euclidean (L2)

Similarity Search

Similarity search means: given an embedding vector, find the stored vectors that are closest to it.

This is crucial for LLM memory.

Instead of asking the model to “remember” everything or hallucinating answers, we retrieve the most relevant facts, messages, documents, or prior interactions before the LLM generates a response.

Indexing

PgVector supports two main index types:

  • IVFFlat (fast, approximate search – great for production)
  • HNSW (hierarchical – even faster for large datasets)

Example index creation:

CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);


Using PgVector With Embeddings

Step 1: Generate Embeddings

You generate embeddings from any model:

  • OpenAI Embeddings
  • Azure
  • HuggingFace models
  • Cohere
  • Llama.cpp
  • Custom fine‑tuned transformers

Example (OpenAI):

POST https://api.openai.com/v1/embeddingsCode language: JavaScript (javascript)

{

“model”: “text-embedding-3-large”,

“input”: “Hello world”

}

This returns a vector like:

[0.0213, -0.0045, 0.9983, …]

Step 2: Store Embeddings in PostgreSQL

A table for memory might look like:

CREATE TABLE memory (

id SERIAL PRIMARY KEY,

content TEXT NOT NULL,

embedding vector(1536),

metadata JSONB,

created_at TIMESTAMP DEFAULT NOW()

);

Insert data:

INSERT INTO memory (content, embedding)

VALUES (

‘User likes Japanese and Mexican cuisine’,

‘[0.234, -0.998, …]’

);

Step 3: Query Similar Records

SELECT content, (embedding <=> ‘[0.23, -0.99, …]’) AS distance

FROM memory

ORDER BY embedding <=> ‘[0.23, -0.99, …]’

LIMIT 5;

This returns the top 5 most relevant memory snippets and those will be added to the prompt context.


Storing Values for AI Memory

What You Store Depends on Your Application

You can store:

  • Chat history messages
  • User preferences
  • Past actions
  • Product details
  • Documents
  • Errors and solutions
  • Knowledge base articles
  • User profiles

Recommended Structure

A flexible structure:

{

“type”: “preference”,

“user_id”: 42,

“source”: “chat”,

“topic”: “food”,

“tags”: [“japanese”, “mexican”]

}

This gives you the ability to:

  • Filter search by metadata
  • Separate memories per user
  • Restrict context retrieval by type

Temporal Decay (Optional)

You can implement ranking adjustments:

  • Recent memories score higher
  • Irrelevant memories score lower
  • Outdated memories auto‑expire

This creates human‑like memory behavior.


Reducing Hallucinations With PgVector

LLMs hallucinate when they lack context.

Most hallucinations are caused by missing information, not by model failure.

PgVector solves this by ensuring the model always receives:

  • The top relevant facts
  • Accurate summaries
  • Verified data

Retrieval-Augmented Generation (RAG)

You transform a prompt from:

Without RAG:

“Tell me about Ivan’s garden in Canada.”

With RAG:

“Tell me about Ivan’s garden in Canada. Here are relevant facts from memory: The garden is 20m². – Located in Canada. – Used for planting vegetables.”

The model no longer needs to guess.

Why This Reduces Hallucination

Because the model:

  • Is not guessing user data
  • Only completes based on retrieved facts
  • Gets guardrails through data-driven knowledge
  • Becomes deterministic

PgVector acts like a mental database for the AI.


Adding PgVector to a Production App

Here’s the blueprint.

1. Install the extension

CREATE EXTENSION IF NOT EXISTS vector;

2. Create your memory table

Use the structure that fits your domain.

3. Create an index

CREATE INDEX memory_embedding_idx
ON memory USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

4. Create a Memory Service

Your backend service should:

  • Accept content
  • Generate embeddings
  • Store them with metadata

And another service should:

  • Take an embedding
  • Query top-N matches
  • Return the context

5. Use RAG in your LLM pipeline

Every LLM call becomes:

  1. Embed the question
  2. Retrieve relevant memory
  3. Construct prompt
  4. Call the LLM
  5. Store new memories (if needed)

6. Add Guardrails

Production memory systems need:

  • Permission control (per user)
  • Expiration rules
  • Filters (e.g., exclude private data)
  • Maximum memory size

7. Add Analytics

Track:

  • Hit rate (how often memory is used)
  • Relevance quality
  • Retrieval time

Common Pitfalls and How to Avoid Them

❌ Storing whole conversation transcripts

This leads to massive token usage. Instead, store summaries.

❌ Retrieving too many memories

Keep context small. 3–10 items is ideal.

❌ Wrong distance metric

Most embedding models work best with cosine similarity.

❌ Using RAG without metadata filters

You don’t want another user’s memory leaking into the context.

❌ No indexing

Without IVFFlat/HNSW, retrieval becomes extremely slow.


When Should You Use PgVector?

Use it if you:

  • Already use PostgreSQL
  • Want simple deployment
  • Want memory that scales to millions of rows
  • Need reliability and ACID guarantees
  • Want to avoid new infrastructure like Pinecone, Weaviate, or Milvus

Do NOT use it if you:

  • Need billion‑scale vector search
  • Require ultra‑low latency for real‑time gaming or streaming
  • Need dynamic sharding across many nodes

But for 95% of AI apps, PgVector is perfect.


Conclusion

PgVector is the bridge between normal production data and the emerging world of AI memory. For developers building real applications chatbots, agents, assistants, search engines, personalization engines it offers the most convenient and stable foundation.

You get:

  • Easy deployment
  • Reliable storage
  • Fast similarity search
  • A complete memory layer for AI

This turns your LLM features from fragile experiments into solid, predictable production systems.

If you’re building AI products in 2025, PgVector isn’t “nice to have” it’s a core architectural component.

Saving Money With Embeddings in AI Memory Systems: Why Ruby on Rails is Perfect for LangChain

In the exploration of AI memory systems and embeddings, the author highlights the hidden costs in AI development, emphasizing token management. Leveraging Ruby on Rails streamlines the integration of LangChain for efficient memory handling. Adopting strategies like summarization and selective retrieval significantly reduces expenses, while maintaining readability and scalability in system design.

Over the last few months of rebuilding my Rails muscle memory, I’ve been diving deep into AI memory systems and experimenting with embeddings. One of the biggest lessons I’ve learned is that the cost of building AI isn’t just in the model it’s in how you use it. Tokens, storage, retrieval these are the hidden levers that determine whether your AI stack remains elegant or becomes a runaway expense.

And here’s the good news: with Ruby on Rails, managing these complexities becomes remarkably simple. Rails has always been about turning complicated things into something intuitive and maintainable and when you pair it with LangChain, it feels like magic.


Understanding the Cost of Embeddings

Most people think that running large language models is expensive because of the model itself. That’s only partially true. In practice, the real costs come from:

  • Storing too much raw content: Every extra paragraph you embed costs more in tokens, both for the embedding itself and for later retrieval.
  • Embedding long texts instead of summaries: LLMs don’t need the full novel they often just need the distilled version. Summaries are shorter, cheaper, and surprisingly effective.
  • Retrieving too many memories: Pulling 50 memories for a simple question can cost more than the model call itself. Smart retrieval strategies can drastically cut costs.
  • Feeding oversized prompts into the model: Every extra token in your prompt adds up. Cleaner prompts = cheaper calls.

I’ve seen projects where embedding every word of a document seemed “safe,” only to realize months later that the token bills were astronomical. That’s when I started thinking in terms of summary-first embeddings.


How Ruby on Rails Makes It Easy

Rails is my natural playground for building systems that scale reliably without over-engineering. Why does Rails pair so well with AI memory systems and LangChain? Several reasons:

Migrations Are Elegant
With Rails, adding a vector column with PgVector feels like any other migration. You can define your tables, indexes, and limits in one concise block:

 <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">class AddMemoriesTable < ActiveRecord::Migration[7.1] </span>
   <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">def change </span>
     <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">enable_extension "vector" </span>
     <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">create_table :memories do |t| </span>
       <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">t.text :content, null: false </span>
       <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">t.vector :embedding, limit: 1536 </span>
       <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">t.jsonb :metadata </span>
       <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">t.timestamps </span>
     <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">end </span>
   <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">end </span>
<span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial;">end</span> 


There’s no need for complicated schema scripts. Rails handles the boring but essential details for you.

ActiveRecord Makes Embedding Storage a Breeze
Storing embeddings in Rails is almost poetic. With a simple model, you can create a memory with content, an embedding, and metadata in a single call:

<code>Memory.create!( </code>
<code>content: "User prefers Japanese and Mexican cuisine.", </code>
<code>embedding: embedding_vector, </code>
  <code>metadata: { type: :preference, user_id: 42 } </code>
<code>)</code> Code language: HTML, XML (xml)

And yes, you can query those memories by similarity in a single, readable line:

<code>Memory.order(Arel.sql("embedding <=> '[#{query_embedding.join(',')}]'")) .limit(5)</code> Code language: HTML, XML (xml)

Rails keeps your code readable and maintainable while you handle sophisticated vector queries.

LangChain Integration is Natural
LangChain is all about chaining LLM calls, memory storage, and retrieval. In Rails, you already have everything you need: models, services, and job queues. You can plug LangChain into your Rails services to:

  • Summarize content before embedding
  • Retrieve only the most relevant memories
  • Cache embeddings efficiently for repeated use
Rails doesn’t get in the way. It gives you structure without slowing you down.


Saving Money with Smart Embeddings

Here’s the approach I’ve refined over multiple projects:

  1. Summarize Before You Embed
    Instead of embedding full documents, feed the model a summary. A 50-word summary costs fewer tokens but preserves the semantic meaning needed for retrieval.
  2. Limit Memory Retrieval
    You rarely need more than 5–10 memories for a single model call. More often than not, extra memories just bloat your prompt and inflate costs.
  3. Use Metadata Wisely
    Store small, structured metadata alongside your embeddings to filter memories before similarity search. For example, filter by user_id or type instead of pulling all records into the model.
  4. Cache Strategically
    Don’t re-embed unchanged content. Use Rails validations, background jobs, and services to embed only when necessary.

When you combine these strategies, the savings are significant. In some projects, embedding costs dropped by over 70% without losing retrieval accuracy.


Why I Stick With Rails and PostgreSQL

There are many ways to build AI memory systems. You could go with specialized databases, microservices, or cloud vector stores. But here’s what keeps me on Rails and Postgres:

  • Reliability: Postgres is mature, stable, and production-ready. PgVector adds vector search without changing the foundation.
  • Scalability: Rails scales surprisingly well when you keep queries efficient and leverage background jobs.
  • Developer Happiness: Rails lets me iterate quickly. I can prototype, test, and deploy AI memory features without feeling like I’m juggling ten different systems.
  • Future-Proofing: Rails projects can last years without a complete rewrite. AI infrastructure is still evolving having a stable base matters.

Closing Thoughts

AI memory doesn’t have to be complicated or expensive. By thinking carefully about embeddings, summaries, retrieval, and token usage and by leveraging Rails with LangChain you can build memory systems that are elegant, fast, and cost-effective.

For me, Rails is more than a framework. It’s a philosophy: build systems that scale naturally, make code readable, and keep complexity under control. Add PgVector and LangChain to that mix, and suddenly AI memory feels like something you can build without compromise.

In the world of AI, where complexity grows faster than budgets, that kind of simplicity is priceless.

The SaaS Model Isn’t Dead, it’s Evolving Beyond the Hype of “Vibe Coding”

The article critiques the rise of “vibe coding,” emphasizing the distinction between quick prototypes and genuine MVPs. It argues that while AI can accelerate product development, true success relies on accountability, stability, and structure. Ultimately, SaaS is evolving, prioritizing reliable infrastructure and reinforcement over mere speed and creativity.

“The SaaS model is dead. Long live vibe-coded AI scripts.”

That’s the kind of hot take lighting up LinkedIn half ironic, half prophetic.

Why pay $99/month for a product when you can stitch together 12 AI prompts, 3 no-code hacks, and a duct-taped Python script you barely understand?

Welcome to vibe coding.

It feels fast. It feels clever.
Until the vibes break and no one knows why.


The Mirage of Instant Software

We live in an era of speed.
AI gives us instant answers, mockups, and even “apps.” The line between prototype and product has never been thinner and that’s both empowering and dangerous.

What used to take months of product design, testing, and iteration can now be faked in a weekend.
You can prompt ChatGPT to generate a working landing page, use Bubble or Replit for logic, and Zapier to glue it all together.

Boom “launch” your MVP.

But here’s the truth no one wants to say out loud:
Most of these AI-fueled prototypes aren’t MVPs. They’re demos with good lighting.

A real MVP isn’t about how fast you can ship; it’s about how reliably you can learn from what you ship.

And learning requires stability.
You can’t measure churn or retention when your backend breaks every other day.
You can’t build trust when your app crashes under 20 users.

That’s when the vibes start to fade.


The Boring Truth Behind Great Products

Let’s talk about what SaaS really sells.
It’s not just the product you see it’s everything beneath it:

  • Uptime: Someone is on-call at 3 AM keeping your app alive.
  • Security: Encryption, audits, GDPR, SOC2 the invisible scaffolding of trust.
  • Maintenance: When APIs change or libraries break, someone fixes it.
  • Versioning: “Update Available” didn’t write itself.
  • Support: Human beings who care when you open a ticket.

When you pay for SaaS, you’re not paying for buttons.
You’re paying for accountability for the guarantee that someone else handles the boring stuff while you focus on your business.

And boring, in software, is beautiful.
Because it means stability. Predictability. Peace of mind.


The Myth of the One-Prompt MVP

There’s a growing illusion that AI can replace the entire MVP process.
Just write a long enough prompt, and out comes your startup.

Except… no.

Building an MVP is not about output. It’s about the iteration loop testing, learning, refining.

A real MVP requires:

  • Instrumentation: Analytics to track usage and retention.
  • UX Design: Understanding user friction.
  • Scalability: Handling 500 users without collapse.
  • Product Roadmap: Knowing what not to build yet.
  • Legal & Compliance: Because privacy questions always come.

AI can accelerate this process but it can’t replace it.
Because AI doesn’t understand your market context, users, or business model.
It’s a tool not a cofounder.


From Vibes to Viability

There’s real power in AI-assisted building.
You can move fast, experiment, and prototype ideas cheaply.

But once something works, you’ll need to replace your prompt stack and Zapier web of glue code with solid infrastructure.

That’s when the SaaS mindset returns.
Not because you need to “go old school,” but because you need to go sustainable.

Every successful product eventually faces the same questions:

  • Who maintains this?
  • Who owns the data?
  • Who ensures it still works next month?
  • Who’s responsible when it breaks?

The answer, in true SaaS fashion, must always be: someone accountable.


SaaS Isn’t Dead, it’s Maturing

The world doesn’t need more quick hacks.
It needs more craftsmanship builders who blend speed with discipline, creativity with structure, and vibes with reliability.

SaaS isn’t dying; it’s evolving.

Tomorrow’s SaaS might not look like subscription dashboards.
It might look like AI agents, private APIs, or personalized data layers.

But behind every “smart” layer will still be boring, dependable infrastructure databases, authentication, servers, and teams maintaining uptime.

The form changes.
The value reliability, scalability, trust never does.


Final Thought: Build With Vibes, Ship With Discipline

There’s nothing wrong with vibe coding. It’s an amazing way to experiment and learn.

But if you want to launch something that lasts, something customers depend on you’ll need more than vibes.
You’ll need product thinking, process, and patience.

That’s what separates a weekend project from a real business.

So build with vibes.
But ship with discipline.

Because that’s where the magic and the money really happens.

If you liked this post, follow me for more thoughts on building real products in the age of AI hype where craftsmanship beats shortcuts every time.

Artisanal Coding (職人コーディング): A Manifesto for the Next Era of Software Craftsmanship

Artesanal coding emphasizes the importance of craftsmanship in software development amidst the rise of AI and “vibe coding.” It advocates for intentional, quality-driven coding practices that foster deep understanding and connection to the code. By balancing AI assistance with craftsmanship, developers can preserve their skills and create sustainable, high-quality software.

In an age where code seems to write itself and AI promises to make every developer “10x faster,” something essential has quietly started to erode our craftsmanship. I call the counter-movement to this erosion artisanal coding.

Like artisanal bread or craft coffee, artisanal coding is not about nostalgia or resistance to progress. It’s about intentionality, quality, and soul; things that can’t be automated, templated, or generated in bulk. It’s the human touch in a field that’s rushing to outsource its own intuition.

What Is Artisanal Coding (職人コーディング)?

Artisanal coding is the conscious resistance to that decay.
It’s not anti-AI, it’s anti-carelessness. It’s the belief that the best code is still handmade, understood, and cared for.

Think of an artisan carpenter.
He can use power tools but he knows when to stop and sand by hand. He knows the wood, feels its resistance, and adjusts. He doesn’t mass-produce he perfects.

Artisanal coding applies that mindset to software. It’s about:

  • Understanding the problem before touching the code.
  • Writing it line by line, consciously.
  • Refactoring not because a tool says so, but because you feel the imbalance.
  • Learning from your errors instead of patching them away.

It’s slow. It’s deliberate. And that’s the point.

Artisanal coding is the deliberate act of writing software by hand, with care, precision, and understanding. It’s the opposite of what I call vibe coding the growing trend of throwing AI-generated snippets together, guided by vibes and autocomplete rather than comprehension.

This is not about rejecting tools it’s about rejecting the loss of mastery. It’s a mindset that values the slow process of creation, the small victories of debugging, and the satisfaction of knowing your code’s structure like a craftsman knows the grain of wood.

Why We Need Artisanal Coding

  1. We’re losing our muscle memory.
    Developers who rely too heavily on AI are forgetting how to solve problems from first principles. Code completion is helpful, but when it replaces thought, the skill atrophies.
  2. Code quality is declining behind pretty demos.
    Vibe coding produces software that “works” today but collapses tomorrow. Without deep understanding, we can’t reason about edge cases, performance, or scalability.
  3. We risk becoming code operators instead of creators.
    The satisfaction of crafting something elegant is replaced by prompt-tweaking and debugging alien code. Artisanal coding restores that connection between creator and creation.
  4. AI cannot feel the friction.
    Friction is good. The process of struggling through a bug teaches lessons that no autocomplete can. That frustration is where true craftsmanship is born.

The Role (and Limitations) of AI in Artisanal Coding

Artisanal coding doesn’t ban AI. It just defines healthy boundaries for its use.

Allowed AI usage:

  • Short code completions: Using AI to fill in a few lines of boilerplate or repetitive syntax.
  • Troubleshooting assistance: Asking AI or community-like queries outside the codebase similar to how you’d ask Stack Overflow or a mentor for advice.

🚫 Not allowed:

  • Generating entire functions or components without understanding them.
  • Using AI to “design” the logic of your app.
  • Copy-pasting large sections of unverified code.

AI can be your assistant, not your replacement. Think of it as a digital apprentice, not a co-author.


The Future Depends on How We Code Now

As we rush toward AI-assisted everything, we risk raising a generation of developers who can’t code without help. Artisanal coding is a statement of independence a call to slow down, think deeply, and keep your hands on the keyboard with intent.

Just as artisans revived craftsmanship in industries overtaken by automation, we can do the same in tech. The software we write today shapes the world we live in tomorrow. It deserves the same care as any other craft.

Artisanal coding is not a movement of the past, it’s a movement for the future.
Because even in the age of AI, quality still matters. Understanding still matters. Humans still matter.

If vibe coding is the fast food of software, artisanal coding is the slow-cooked meal: nourishing, deliberate, and made with care.
It takes more time, yes. But it’s worth every second.

Let’s bring back pride to our craft.
Let’s code like artisans again.

In many ways, artisanal coding echoes the Japanese philosophies of Shokunin [職人] (the pursuit of mastery through mindful repetition), Wabi-sabi [侘寂] (the acceptance of imperfection as beauty), and Kaizen [改善] (the quiet dedication to constant improvement). A true craftsperson doesn’t rush; they refine. They don’t chase perfection; they respect the process. Coding, like Japanese pottery or calligraphy, becomes an act of presence a meditative dialogue between the mind and the material. In a world driven by automation and speed, this spirit reminds us that the deepest satisfaction still comes from doing one thing well, by hand, with heart.

Final Thoughts

This post marks a turning point for me and for this blog.
I’ve spent decades building software, teams, and systems. I’ve seen tools come and go, frameworks rise and fade. But never before have we faced a transformation this deep one that challenges not just how we code, but why we code.

Artisanal coding is my response.
From this point forward, everything I write here every essay, every reflection will revolve around this principle: building software with intention, understanding, and care.

This isn’t just about programming.
It’s about reclaiming craftsmanship in a world addicted to shortcuts.
It’s about creating something lasting in an era of instant everything.
It’s about remembering that the hands still matter.

“職人コーディング – Writing software with heart, precision, and purpose.”

Brainrot and the Slow Death of Code

The rise of AI tools in software development is leading to a decline in genuine coding skills, as developers increasingly rely on automation. This reliance dampens critical thinking and creativity, replacing depth with superficial efficiency. Ultimately, the industry risks producing inferior code devoid of understanding, undermining the essence of craftsmanship in programming.

It’s an uncomfortable thing to say out loud, but we’re witnessing a slow decay of human coding ability a collective brainrot disguised as progress.

AI tools are rewriting how we build software. Every week, new developers boast about shipping apps in a weekend using AI assistants, generating entire APIs, or spinning up SaaS templates without understanding what’s going on beneath the surface. At first glance, this looks like evolution a leap forward for productivity. But beneath that veneer of efficiency, something essential is being lost.

Something deeply human.

The Vanishing Craft

Coding has always been more than just typing commands into a terminal. It’s a way of thinking. It’s logic, structure, and creativity fused into a single process the art of turning chaos into clarity.

But when that process is replaced by autocomplete and code generation, the thinking disappears. The hands still move, but the mind doesn’t wrestle with the problem anymore. The apprentice phase the long, painful, necessary stage of learning how to structure systems, debug, refactor, and reason gets skipped.

And that’s where the rot begins.

AI gives us perfect scaffolding but no understanding of the architecture. Developers start to “trust” the model more than themselves. Code review becomes an act of blind faith, and debugging turns into a guessing game of prompts.

The craft is vanishing.

We Are Losing Muscle Memory

Just like a musician who stops practicing loses touch with their instrument, coders are losing their “muscle memory.”

When you stop writing code line by line, stop thinking about data flow, stop worrying about algorithms and complexity your instincts dull. The small patterns that once made you fast, efficient, and insightful fade away.

Soon, you can’t feel when something’s wrong with a function or a model. You can’t spot the small design flaw that will turn into technical debt six months later. You can’t intuit why the system slows down, or why memory leaks appear.

AI-generated code doesn’t teach you these instincts it just hides the consequences long enough for them to explode.

Inferior Code, Hidden in Abundance

We’re producing more code than ever before but most of it is worse.

AI makes quantity trivial. Anyone can spin up ten microservices, fifty endpoints, and thousands of lines of boilerplate in an hour. But that abundance hides a dangerous truth: we are filling the digital world with code that nobody understands.

Future engineers will inherit layers of opaque, AI-generated software systems without authors, without craftsmanship, without intention. It’s digital noise masquerading as innovation.

This isn’t progress. It’s entropy.

The Myth of “Productivity”

The industry loves to equate productivity with success. But in software, speed isn’t everything. Some of the best systems ever built took time, reflection, and human stubbornness.

We’re now in a paradox where developers produce more but learn less. Where every shortcut taken today adds future friction. The so-called “productivity gains” are borrowed time a loan with heavy interest, paid in debugging, maintenance, and fragility.

When code becomes disposable, knowledge follows. And when knowledge fades, innovation turns into imitation.

The Future Is Not Hopeless If We Choose Discipline

The solution isn’t to reject AI it’s to reestablish the boundaries between tool and craftsman.

AI should be your assistant, not your brain. It should amplify your understanding, not replace it. The act of writing, reasoning, and debugging still matters. You still need to understand the stack, the algorithm, the data flow.

If you don’t, the machine will own your craft and eventually, your value.

Software built by people who no longer understand code will always be inferior to software built by those who do. The future of code depends on preserving that human layer of mastery the part that questions, improves, and cares.

Closing Thought

What’s happening isn’t the death of coding it’s the death of depth.

We’re watching a generation of builders raised on autocomplete lose touch with the essence of creation. The danger isn’t that AI will replace programmers. The danger is that programmers will forget how to think like programmers.

Brainrot isn’t about laziness it’s about surrender. And if we keep surrendering our mental muscles to the machine, we’ll end up with a future full of code that works but no one knows why.