Make Good AI Code

Tips for getting the best code outputs from large language models.

1. Ask for one thing at a time

After working with these models for some time, you’ll come to discover that the context window is the biggest limitation. Large Language models, like humans, have a limited memory capacity. To maximize the best output, ask for one thing at a time. You can’t just prompt it:

make me an uber clone, then add a dog walking feature to it and airbnb

It will proceed to vomit hot drivel onto your chest and screen.

Ask for one thing at a time, one change at a time. Following the previous uber example, you would need to start small by asking it for a basic scaffolding of an app, then proceed to e.g., ask it to add a map to the app, then make the map interactive, etc.

2. Identify what not to have the LLM do

There are some things that the AI is really good at — like scaffolding tests, generating React code, etc.

But there are some things that it’s just not good at. It’s still not very good at CSS, for example. At the time of writing this, when I ask LLMs to generate CSS I get a lot of garbage, it can generally get the job done but the code is not very readable, lots of random CSS properties, and sometimes it will generate invalid CSS.

Similarly, there are some things that you’re just better off doing yourself. For example, setting up a vite app. The CLI from Vite is extremely well designed and easy to use, I’d rather just use the CLI myself. It will be faster than asking Claude Code to do it for me, I can more reliably get the exact setup I want… and also way cheaper than burning a bunch of tokens to have the bumble through the set up steps.

3. Just Ask Again

We haven’t seen really any fundamental breakthroughs in AI since they released ChatGPT. It’s all the same underlying principles in the way they train the models and run inference.

The key innovation with e.g., Claude Code, Deepseek r-1, Deep Resarch, and Cursor in 2025: we can basically get these models to do what we want by brute-forcing the problem. That’s how these new models work, they just repeatedly throw more LLM inference at the problem and string together outputs from multiple LLM inference calls, then manuganate the results, and repeat the process until they get the desired output.

LLMs are stochastic in nature, and context rot is real. If you get a result from the LLM that you don’t like, just ask again. The model will generate a different output.

4. Be as explicit as possible, aka know what you want, and give examples

Prompt Engineering is an actual thing, a good prompt can be the difference between the LLM falling flat on its face or producing an absolute banger of an output.

Try to guide the model as much as you can, providing bounds for where and what you want it to generate. I find I can reallly get good outputs if I give a prompt in this general format:

Implement a function that does <desired behavior>

Write the function in this file here: <path/to/file.ts#l22>

Follow a similar pattern to these existing functions:

<example_file_1>
<example_file_2>
<example_file_3>

Reuse the utilities if possible.

DO NOT run prettier on the output, and do not write any unit tests for this yet.

Do not start a dev server - there is already one running on localhost:3030

5. CHECK THE OUTPUT

It’s easy, very very easy to just blindly accept the code output by the LLM. Do not fall to the temptation.

Think of the LLM as a magic genie, disguised as a golden retriever, on acid. It will do it’s best to please you, but susceptible to hallucinations, and may end up doing backflips just to fulfill your wish and “complete the task”. A classic example of this is when the LLM starts editing test files to make them pass.

When the LLM is wrong…it is insideously wrong.

Check the output carefully before moving on to the next step.

Last modified: July 30, 2025