Introduction

For the past two years, I have been developing professionally using AI for code generation. Over the last year and a half, I have built and maintained codebases of various sizes—some small, some very large—where up to 95% of the code was generated by AI. There is a common belief that as a project grows, development speed inevitably slows down, and eventually, AI is no longer able to accelerate development as effectively as it does at the start. However, my experience shows that if you take the right approach and employ certain strategies, you can achieve the same level of development speed in a large, AI -friendly codebase as you would when starting a new project. Achieving this level of AI involvement is still rare, especially for large projects, and it has given me unique insight into the challenges and opportunities of AI-assisted development. Drawing from my firsthand experience, I will share practical strategies and best practices that can help you get the most out of AI code generation, whether you are just starting out or looking to refine your workflow. The following sections are designed to guide you in maximizing both productivity and code quality with AI tools.

For the purpose of this blog, I decided to "vibe code" an entire application without applying any of the methodologies outlined in this article or exercising much human oversight over the code. The result was impressively fast: I had a working MVP in less than a day, all while multitasking—cleaning the car and doing the dishes—since I wasn't reviewing code or specifying a particular code structure. However, when it came time to tweak or extend features, I quickly hit a wall. Despite numerous attempts to modify the code using the same AI, it simply couldn't manage the changes, entirely because the lack of oversight had allowed spaghetti code to accumulate and go unchecked.

How can you avoid this pitfall and keep your repositories AI-accelerated for the long term? The following sections outline strategies for building sustainable, maintainable codebases that enable continuous, AI-driven development.

Starting with the Basics

Embracing Flexibility in Code Implementation

When using AI code generation tools, it's important to allow some concessions regarding the structure and style of the code. Allowing the AI to determine how it writes the internals of a function—such as the main logic and implementation details—can result in faster iteration and greater coding efficiency, as less time is spent on manually writing and optimizing code.

AI doesn’t always generate code exactly as intended. The best results are often a middle ground. Allowing AI to use patterns and styles it naturally favors—its own style—improves efficiency in processing and future updates. Just as humans find it easier to work with code written in a familiar style, AI performs better when code aligns with the patterns it inherently generates. Think of it like AI image generation: the best approach is to provide a general idea and let the AI work within its capabilities rather than forcing an exact solution. You rarely get exactly what you envision, but rather something close that aligns with how the AI interprets your request. Code generation follows the same principle.

Keep file sizes small

Keeping file sizes small enhances the AI's ability to process and update them efficiently—this reduces the risk of exceeding token limits and improves maintainability—while also enabling faster development. Using abbreviated code from the initial output speeds up your workflow by enabling you to parallelize file changes across multiple files, helps you stay within the output limits of expensive models, and reduces cost. This approach allows you to make sweeping changes across dozens of files without exceeding model constraints or incurring unnecessary expenses.

For best results, aim to keep your files around 300 lines or fewer. Once files reach about 500 lines, the AI's speed and performance noticeably decline—responses slow down, and the risk of errors or unwanted code abbreviations increases when making changes. A common example of an unwanted abbreviation is when the AI inserts placeholder lines such as the dreaded ...rest of the code here..., which can leave important logic incomplete and necessitate further manual intervention.

Provide references

Supplying code or architecture references can help guide the AI's assumptions and encourage consistency. Importantly, these references do not need to be directly related to the specific implementation you are working on. For example, if you've implemented a service before and want to create a new, unrelated service, including the previous service as a reference can serve as a style guide. Even if the referenced file performs a completely different function, the AI will naturally adopt its coding style and structure, leading to more consistent and maintainable code across your projects.

Limit complexity per prompt

Different AI models are designed to handle varying levels of complexity in their prompts and tasks. Some advanced models can effectively manage highly complex or interconnected requests, while simpler models may perform better with smaller, more focused prompts. If the AI is neglecting or forgetting instructions, this may indicate that the request is too complex or interconnected. In such cases, try scoping down your requests or breaking them into smaller parts to ensure better performance and accuracy. Focus on the interconnections and dependencies within the task rather than just its size.

Example: Complex vs. Large-but-Manageable Prompts

For example, consider the difference between a prompt that is too complex and one that is large but not overly complex:

Too Complex:

"Refactor the entire user authentication system to use JWT tokens instead of sessions, add multi-factor authentication, update all user-related endpoints to support OAuth, and integrate logging and error reporting for all authentication failures, making sure it works with both the existing web and mobile clients. Also, update the documentation and tests accordingly."

This type of prompt asks the AI to tackle multiple major changes across different domains (authentication, multi-factor, OAuth, logging, cross-platform support, documentation, and tests) in one go. The tasks are deeply interconnected, increasing the risk of missed requirements, inconsistent updates, or errors.

Large but Not Too Complex:

"Update the user authentication system to use JWT tokens instead of sessions. Update the login and logout endpoints accordingly, add a new controller, new client-side API files, change the login/signup UI code to use the new client side API files, and provide new tests for the updated authentication flow."

Here, the prompt still involves multiple steps, but all actions are closely related to a single change—switching to JWT-based authentication. The AI can focus on this single context, making the request manageable despite its size. Furthermore, the work the AI actually has to do is naturally sequential: as the AI outputs the first part of the solution, such as writing a controller for a new endpoint, it naturally leads into creating a test for that endpoint or generating the client-side API files to access it. By chaining this work together in a logical sequence, you enable the AI to use its own code as a reference for the next step, allowing you to create, in some cases, dozens of files with a single prompt!

Takeaway: The more you use AI for code generation, the more intuitive it becomes to recognize its limitations. When in doubt, break up requests that span multiple domains or have many dependencies, and group together related changes that serve a single purpose. Over time, you'll develop a sense for when the AI will handle a request smoothly and when it might struggle or miss the mark—helping you guide the AI to deliver more accurate and coherent results.

Giving AI Too Little To Do Can Be Bad Too

Conversely, if you give advanced models too little to do—such as very simple or trivial tasks—they may become 'under-challenged.' It's almost like the AI gets bored and starts to fiddle with things that are not part of your prompt, leading to unnecessary changes or over-complicating the solution. To avoid this, try combining several related trivial tasks into a single prompt so the AI remains focused and productive, rather than looking for ways to “improve” things you didn’t ask for.

Another solution to this is to use a less capable AI model. For example, choosing Sonnet 3.5 instead of 3.7 is a common choice when requesting smaller changes.

Step-by-step prompting

Requesting a step-by-step plan as a separate prompt, without asking the AI to generate any code at this stage, is one of the most important strategies for effective code generation. By ensuring that the AI does not return any code during the planning phase, you encourage it to focus entirely on reasoning through the problem and outlining a clear sequence of actions. This approach reduces cognitive load per request and ensures that thoughtful analysis is completed before any code is written. As a result, the AI can concentrate on producing higher quality code in a subsequent request, following the carefully considered plan.

When you first request reasoning or a step-by-step plan, and only later follow up with a request for code generation, the AI’s attention heads are effectively reset for the new request. This allows the model to concentrate fully on implementing the code according to the established plan, without simultaneously needing to work through the underlying problem. By following this two-step process, the AI can focus on code implementation itself, often enabling the generation of larger and more complex sections of code with greater accuracy and coherence.

Verbose & Explanatory Prompting vs. Exploratory Prompting Techniques

People often tend to be very concise when typing their intent, which can sometimes lead to a lack of clarity. However, when speaking aloud, we naturally become more verbose, often explaining the same thing in different ways—even if some explanations may be redundant or slightly contradictory. This added verbosity can actually enhance the AI's understanding of your intent. For this reason, I recommend using voice input when possible, as the richness and variety in spoken explanations can help the AI interpret your requests more accurately.

There are times when less instruction is actually better. For example, when you're not entirely sure what you want yet and would like the AI to brainstorm or propose options, open-ended prompting can be a powerful tool for exploring solutions with the AI. This approach goes beyond simply having the AI list alternatives—you can let the AI generate a completely new UI or implementation for you to explore. By reviewing the AI’s creations, you may discover elements or directions you hadn’t considered, allowing you to pick out and refine the parts you like. Instead of specifying every detail, you give the AI more creative freedom, which can result in unexpected or innovative ideas emerging.

Iterative Prompting

Iterative prompting is the process of refining your AI prompts through multiple cycles, each time resetting the conversation or code context. Rather than carrying on a single, continuous chat, you revise your instructions based on what the AI produces, gradually zeroing in on your ideal solution. This strategy allows you to leverage both open-ended creativity and precise guidance, drawing out the best results by alternating between broad exploration and targeted refinement.

Here are two contrasting examples of this prompting style, both focused on displaying user information in a React UI with mouse-over functionality:

Highly Directed Prompt:

"Create a React functional component called UserProfileCard that displays a user's avatar, name, and email address in a card that appears when you mouse over a user's name. Style the card with a light gray background and rounded corners, and include basic prop type validation for the user's data. Do not include any state or side effects beyond the mouse-over interaction."

This prompt provides the AI with clear, specific instructions for implementing a mouse-over UI feature, leaving little room for creative interpretation.

Open-Ended Prompt:

"I'd like to display more details about a user when someone mouses over their name in a React UI. I need some creative ways to present the user's info, like their name, avatar, and contact details, on mouse-over.

In this open-ended example, the AI is invited to explore the problem space and offer multiple solutions, potentially surfacing new approaches you hadn’t considered. This style is especially useful during the early phases of design or when you want to leverage the AI’s breadth of knowledge and creativity.

As you notice which elements you prefer from each version, you can gradually incorporate those specifics into your prompt, refining it until you arrive at a version that fully meets your needs. This process is known as Iterative Prompting—where you repeatedly refine the original prompt instead of engaging in a long, continuous conversation. By resetting the code and conversation history each time, you maximize the AI's creativity and avoid the common problem of the model becoming "pigeonholed" or overly constrained by previous interactions.

Code Quality and Hybrid Codebases

AI-generated code is generally good, but rarely great—especially when it comes to refactoring, where there is a real risk of large steps backwards in code quality. When creating new code, AI tends to produce decent results, but manual oversight is still essential. Always review the code diffs and be prepared to guide the AI with specific requests, particularly around code architecture. For example, ask it to generate new components using good modular practices and composition, and to place distinct logic in separate files rather than making everything interconnected in a single file. This approach not only keeps your codebase more maintainable and testable, but also prevents the gradual decline into spaghetti code. Using architecture and patterns that align with how AI models structure code also helps to maintain an AI-friendly codebase.

When contributing manually, it's especially important to follow these conventions so that the overall structure remains consistent and accessible for future AI-assisted development. In fact, it's wise to resist the urge to get too creative or unconventional with your codecraft. While clever abstractions and novel patterns can be satisfying, straying too far from established, straightforward practices makes it harder for both AI and future developers to understand and extend your code. Prioritizing readability, modularity, and consistency ensures that your contributions integrate smoothly with AI-generated code and maintain the long-term health of the codebase.

Modular Design Principles and Architectural Patterns

An Effective Architecture for AI Code Generation

AI code generation benefits greatly from a composition mindset. Instead of relying on components that are internally complex and highly interconnected, composition encourages building systems from small, modular parts that each have a clear responsibility. This approach not only helps AI tremendously, but it also makes the code much easier to test. It is very important that each of the components you create are independently testable. In fact, when you ask the AI to generate a service or component, you should also ask it to generate a test for it at the same time. In my experience, including tests in your prompts does not hinder the AI’s ability to produce quality code; if anything, it may actually improve code quality. Asking for tests alongside the code encourages better design and ensures that your components are robust and functional.

One of the clearest ways to illustrate the difference between tightly coupled and composable (modular) architectures is to examine how responsibilities are distributed in your code. Tightly coupled implementations bundle multiple responsibilities together in a single class or file, making the code harder to test, maintain, and update. In contrast, a composable approach separates each responsibility into its own dedicated class or file, resulting in more modular, testable, and AI-friendly code.

Below is an example in TypeScript showing both approaches for handling user signups. The tightly coupled implementation places all logic into a single file/class, while the composable version separates each concern into its own file and class.

Tightly Coupled Implementation

// UserSignupHandler.ts

export class UserSignupHandler {
  constructor(
    private db: any,
    private emailService: any
  ) {}

  handleSignup(userData: { email?: string }) {
    // Validate input
    if (!userData.email) {
      throw new Error("Missing email");
    }
    // Save to database
    this.db.saveUser(userData);
    // Send welcome email
    this.emailService.send({
      to: userData.email,
      subject: "Welcome!",
      body: "Thanks for signing up!"
    });
    return "Signup complete";
  }
}

All responsibilities are combined in one class/method, making it hard to test, reuse, or modify parts independently.

Composable (Modular) Implementation

Here, each responsibility is separated into its own file and class:

// UserDataValidator.ts

export class UserDataValidator {
  validate(userData: { email?: string }) {
    if (!userData.email) {
      throw new Error("Missing email");
    }
  }
}

// UserRepository.ts

export class UserRepository {
  constructor(
    private db: any
  ) {}

  save(userData: { email: string }) {
    this.db.saveUser(userData);
  }
}

// WelcomeEmailSender.ts

export class WelcomeEmailSender {
  constructor(
    private emailService: any
  ) {}

  send(userData: { email: string }) {
    this.emailService.send({
      to: userData.email,
      subject: "Welcome!",
      body: "Thanks for signing up!"
    });
  }
}

// UserSignupCoordinator.ts

import {
  UserDataValidator
} from './UserDataValidator';
import {
  UserRepository
} from './UserRepository';
import {
  WelcomeEmailSender
} from './WelcomeEmailSender';

export class UserSignupCoordinator {
  constructor(
    private validator: UserDataValidator,
    private repository: UserRepository,
    private emailSender: WelcomeEmailSender
  ) {}

  handleSignup(userData: { email?: string }) {
    this.validator.validate(userData);
    this.repository.save(
      userData as { email: string }
    );
    this.emailSender.send(
      userData as { email: string }
    );
    return "Signup complete";
  }
}

Each file/class has a single responsibility, making the code easier to test, maintain, and extend. This modular approach is also much more AI-friendly, as changes or additions can be made in isolation without affecting unrelated parts of the system.

Summary Table

Aspect	Tightly Coupled	Composable (Modular)
Testability	Low	High
Reusability	Low	High
Maintainability	Hard	Easy
AI Update Friendly	Poor (AI has to parse/understand more context)	Great (AI can focus on small parts)

Composition: Suggesting Single Responsibility Principle (SRP) for Modular AI Code Generation

The Single Responsibility Principle (SRP) is a widely recommended software design pattern that can be helpful in building systems from small, focused components where each class, module, or function ideally has one clear reason to change. Adopting this approach may lead to a natural separation of concerns, potentially reducing cognitive load for both humans and AI, and making code easier to understand, maintain, and extend.

An easy way to think about SRP is to imagine that each part of your code should only have one main job. If you can describe what a class, module, or function does in a single, clear sentence, it probably follows the Single Responsibility Principle. If you find yourself using words like "and" or "also" when explaining its purpose—or if you can imagine multiple unrelated reasons you might need to modify it—it might be doing too much and should be split up.

Alternatively, you can think of SRP as ensuring that each part of your code has a single reason to change. In other words, each class, module, or function should only be updated for one specific reason—such as a change in a particular business rule, data structure, or external dependency it is responsible for.

For example: Imagine you have a UserManager class that both manages user authentication and sends welcome emails. If you need to update how authentication works, you would modify the class—but if you also need to change the email content, you'd be editing the same class for an unrelated reason. This indicates the class has more than one responsibility and should be split into separate classes, such as AuthenticationManager and WelcomeEmailSender.

Adopting modular design principles, such as the Single Responsibility Principle (SRP), is crucial for maintaining a codebase that can sustain rapid, AI-driven development over time. This approach keeps your code focused, reduces errors, and makes ongoing maintenance much easier. Most importantly, it ensures your codebase remains viable for long-term AI code generation as the project grows in size and complexity.

It's also important to recognize that these modular design practices benefit not just AI, but human developers as well. Code that is straightforward for AI models to understand and maintain tends to be more approachable for people, too. By encouraging clear and maintainable code, you help support both effective AI assistance and smoother collaboration within your team.

Key benefits include:

Long-term viability for AI code generation, even as your codebase scales.
Easier debugging and testing, as each module handles one responsibility.
Improved readability, aiding both human and AI collaboration.
Better scalability, with modules developed or extended independently.
Fewer code conflicts when merging changes.
Greater component reuse across projects.

Layering and Verticals: Archetypes

When designing an application, it is important to organize the code into distinct layers. This layered approach helps maintain clarity and manageability, and it promotes single responsibility within components in a layer as systems become more complex. Aligning this organization with the AI's preferred structure is also an important consideration.

For the purposes of this blog, we refer to these preferred structures as archetypes. Exploring and discovering these archetypes requires experimentation with the AI to determine which structures and layers it favors. Before exploring and discovering archetypes, it's important to note that the appropriate archetype generally depends on the stack being used for the application. Different technology stacks may have their own conventions and best practices for structuring code, which can influence the AI's preferred organization. Once identified, choosing the appropriate archetype for a particular software stack can enhance maintainability, scalability, and clarity, ensuring that complex systems remain manageable as they grow.

To further illustrate this, consider a sample web application with key verticals such as User, Order, Product, and Payment management. The table below demonstrates how these verticals are represented across different layers within the application:

Layer	User Vertical	Order Vertical	Product Vertical	Payment Vertical
Controller	UserController	OrderController	ProductController	PaymentController
Service	UserService	OrderService	ProductService	PaymentService
Repository	UserRepository	OrderRepository	ProductRepository	PaymentRepository
DTO	User	Order	Product	Payment
API (Frontend)	/api/users	/api/orders	/api/products	/api/payments
Page	UserPage	OrderPage	ProductPage	PaymentPage
Component	UserProfile, UserList	OrderSummary, OrderDetails	ProductCatalog, ProductDetail	PaymentForm, PaymentHistory
Router	/users, /profile	/orders, /checkout	/products, /catalog	/payments, /history

This grid format helps visualize how different verticals are implemented consistently across each layer, promoting a modular and organized architecture. In this example, each layer has distinct responsibilities:

DTOs (Data Transfer Objects): Define the structure of data exchanged between layers, ensuring type safety and consistency.
Controllers: Handle incoming requests, orchestrate the appropriate services, and return responses to the client.
Services: Contain the core business logic, processes, rules, and computations of the application.
Repositories: Manage data storage, retrieval, and integrity, interacting directly with the underlying data sources.

By clearly defining these responsibilities, the architecture ensures that changes in one layer have minimal impact on others, allowing for easier maintenance and scalability. The consistent implementation of verticals such as authentication, logging, or error handling across all layers further strengthens the system's robustness and coherence.

Practical Execution: Models, Prompting, and Output

Best Models for Generating New Code

The best models for generating new code are Sonnet 3.7 and Sonnet 3.5. Sonnet 3.7 is best suited for large, sweeping, open-ended requests, as it performs well when given complex tasks with broad scope. If you tend to make these kinds of requests, 3.7 is likely your best choice.

Sonnet 3.5 is ideal for more moderately sized or focused requests. If your workflow involves tasks that are less expansive or more incremental, 3.5 is generally the preferred option. However, if you give either model too little to do, they may start to make unnecessary changes or over-engineer solutions.

It’s no longer the case that GPT 4.1 is only suitable for small, simple requests—this latest version is a significant improvement over its predecessor, GPT-4o. In fact, GPT 4.1 can sometimes outperform Sonnet 3.5 and even 3.7, depending on the nature of the task. The key thing to understand is that GPT 4.1 is a very different model with its own unique strengths and quirks. When you find that 3.5 or 3.7 aren’t quite getting you the results you need, switching to GPT 4.1 can help you break out of a rut or approach the problem from a fresh angle. In practice, GPT 4.1 and 3.5 are highly interchangeable for most code generation tasks, so don’t hesitate to try both and see which one is more effective for your current workflow or prompt.

O1 Excels at Debugging Due to Its Outside-the-Box Thinking

Debugging is complex but often requires small, precise solutions. In my experience, o1 and o1-Mini excel at thinking outside the box, often outperforming Sonnet 3.7 when it comes to creative problem-solving. In some cases, these models' approaches even surpass Codebuddy’s planning and code orchestration, making them exceptional tools for tackling tricky debugging challenges.

Use o1 or other "thinking" models exclusively for debugging—they are not well-suited for general code generation. This is likely because these models have an "inner monologue" style of reasoning, which can cause them to stray from your original prompt and inject their own assumptions into the code. For instance, a thinking model might add a new endpoint to a service even if you didn’t request it, simply because it assumes it should be there—when, in reality, the service doesn’t require one.

Rewriting Code from Scratch: A More Efficient Approach Than Refactoring

AI is exceptionally fast at generating new code and consistently produces higher quality results when writing from scratch than refactoring, especially when substantial changes are required. Significant refactoring is often more difficult than simply preserving existing interfaces and allowing AI to rewrite the implementation. This is because major refactoring demands a deep understanding of both the old logic and new requirements, increasing cognitive load and the risk of inconsistencies or overlooked dependencies. By starting fresh, you enable the AI to focus on the new requirements, resulting in code that is cleaner, more cohesive, and easier to maintain. In practice, it is often better to remove the original code, keep the interfaces, and let the AI regenerate the code according to your updated needs.

When using AI to modify existing code—code quality will often degrade with each successive modification. This gradual decline occurs because AI models tend to preserve the “status quo” of the current codebase, making incremental adjustments to accommodate new requirements rather than rethinking the structure holistically. As a result, the code can become increasingly convoluted, with patches and workarounds piling up over time—a process sometimes referred to as “spaghettification.” Each new change risks tangling the logic further, introducing inconsistencies and reducing maintainability. For this reason, we strongly recommend reevaluating your approach to refactoring major changes. Instead of asking the AI to simply fit new requirements into the old framework, it can be more effective to allow the AI to rewrite larger portions of the implementation from scratch, ensuring the resulting code is coherent, maintainable, and aligned with your updated goals.

Migrating Large Files

Working with legacy codebases often means dealing with extremely large files—sometimes exceeding 1,000 lines—that contain a mix of interfaces, functions, and classes all bundled together. Attempting to refactor or update such files in a single pass is rarely effective, as it's simply too much for the AI to process coherently in one go. Instead, it's best to break these large files into a series of smaller, more focused files wherever possible. If fully separating them isn't feasible due to project constraints, guide the AI to work on small sections at a time, maximizing its effectiveness and reducing errors.

When migrating a single large file that contains many interfaces, functions, or methods, it's most effective to approach the refactoring in incremental steps. For example, you can ask the AI to extract and migrate the first five methods into a new class using the old class as a reference, then repeat this process as needed. This stepwise approach enables the AI to focus more deeply on small parts of the migration at a time, reducing the risk of missed details or inconsistencies. Whether you're breaking up one massive file or gradually moving logic out to new files, tackling manageable sections is the surest way to maintain code quality throughout the migration.

Validation, Debugging, and Error Handling

Unit / Functional Tests a Must

AI-generated tests are a valuable tool for quickly verifying the correctness of your code. In most cases, creating tests for newly generated code is both fast and straightforward, as these tests can often be produced at the same time as the original code generation. This approach not only accelerates the validation process, but also improves code quality by maintaining a strong focus on the intended logic and implementation.

With strong test coverage, less manual review is needed—especially when speed matters more than perfect code quality. For rapid prototyping, non-critical projects, or fast iteration, you don't need to obsess over every AI-generated detail. While it's wise to at least glance at the changes, robust tests often allow you to prioritize velocity over meticulous review. Ultimately, adjust your scrutiny based on your project's needs and context.

A word of caution: you should only skip reviewing AI-generated code in truly exceptional cases. In general, it's essential to understand what the AI is producing, since you will inevitably encounter situations where you need to correct its output or address issues with subpar code design.

When AI Responses Don’t Match Expectations

Unexpected responses from AI can indicate misunderstandings in your prompt or incorrect assumptions about the codebase. Often, the AI will attempt to make your request work, but if it conflicts with the actual structure, it may subtly ignore or adjust aspects that don’t fit. This behavior is generally beneficial—it helps prevent the AI from introducing incorrect logic into your codebase.

If you notice the AI seemingly and repeatedly ignoring parts of your prompt—especially specific instructions or requirements—even when your request isn't overly complex or demanding, take this as a signal to reconsider whether your request truly matches how the codebase works. In these cases, the AI tends to adapt your prompt rather than fundamentally restructure the codebase. This means that when prompts and codebase structure are misaligned, the AI will prioritize maintaining the existing system over strictly following new instructions—so always verify that your requests and the codebase's actual design are in sync to avoid subtle miscommunications.

Jump To