Using GPT to accelerate Software Development

T-800 from Terminator 2 saying “Come with me if you want to live''

The Nature of the Beast

Ah, the rise of the machines! We’re not talking about a Terminator-style takeover but rather the incredible potential of GPTs (Generative Pre-trained Transformer) in software development. Think of GPTs as autocomplete on steroids; it’s all about finding the sweet spot in its vast web of word associations to achieve the best results. Though a GPT isn’t creative, it can help us solve already-solved problems, summarize information, and organize our thoughts. But be warned – without proper constraints, a GPT fills in blanks in context and spits out the next most statistically probable word. We humans call this “bullshitting.” Let’s dive into how we can optimize GPTs for software development without getting lost in the matrix.

Improving GPT Reliability

Utilizing GPT models can occasionally produce responses that do not directly address the posed questions. To enhance the dependability of a GPT’s output, certain straightforward methods can be employed. The following are some broad strategies to boost the reliability of GPT models:

Craft an explicit “system level” prompt: Consider setting the character you want ChatGPT to play. Doing so helps you maintain context and keeps the model on track.
Split complex tasks into simpler subtasks: Breaking down problems into smaller steps can significantly improve GPT’s problem-solving capabilities.
Prompt the model to explain before answering: This encourages the model to reason out its answers and helps you understand its thought process.
Prime the pump: Show the model examples of what you want it to achieve. This technique is called “few-shot” learning and helps the model follow a specific logic chain.
Prevent hallucination: If GPT produces gibberish, ask it to answer as truthfully as possible.
Ask for justifications and synthesize: Request multiple answers with their justifications, then synthesize them to find the most appropriate solution.
Generate many outputs and let the model pick the best one: This enables you to choose from various options and increases the likelihood of finding the ideal solution.

System Level Prompts

System level prompts are one way to set up the kind of output you expect a GPT to give back to you. Such prompts set the “character” you want ChatGPT to “play”. If the model starts to “hallucinate” (ie break character), you may need to re-enter this prompt. Future OpenAI interfaces will have a separate UI and “memory bank” for this prompt. For a comprehensive list of prompts, visit Awesome ChatGPT Prompts. Here is an example of a FullStack system level prompt:

“I want you to act as a software developer. I will provide some specific information about app requirements, and it will be your job to come up with an architecture and code for developing a secure app with Golang and Angular. My first request is ‘I want a system that allows users to register and save their vehicle information according to their roles, and there will be admin, user, and company roles. I want the system to use JWT for security’.”

Split complex tasks into more straightforward tasks

Breaking complex tasks into simpler subtasks can significantly improve GPT’s problem-solving abilities. For example, “let’s think step by step” alone can raise GPT-3’s solve rate on math/logic questions from 18% to 79%! (see: https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md) You can also break down the problem into steps yourself, such as when asking for a bubble sort algorithm, by listing each action required.

Prime the Pump

Encourage the model to reason out its answers by providing a few examples (‘few-shot learning’). This helps the model follow specific logic chains, increasing the likelihood of accurate results. For various logic chain styles, check out:

Diagram contrasting Standard Prompting vs Chain of Thought Prompting

Source

Preventing Hallucination

When we say a GPT is “hallucinating,” we mean the model is returning results that are unrelated to our query, or even outright wrong or harmful. Not foolproof, but if you’re getting a ton of gibberish, here’s a quick technique you can try out. First, prompt the GPT to ask you for more information it needs to produce the best result. It can be as simple as “You will let me know if you need any additional inputs to perform this task.” Answer the question as truthfully as possible, and if you’re unsure of the answer, say “Sorry, I don’t know.” Several rounds of this type of questioning can often provide the model with enough context to start providing more relevant output.

Errors and Debugging

Often code generated by GPTs have errors you must debug. The models are synthesizing many code examples from different years, leveraging different library versions, and combining idioms that may be incompatible. That’s okay because GPTs can help you solve these bugs as easily as it creates them. If you’ve given a good system-level prompt, the GPT model may just need the literal error. Just copy-paste and pop it in. More context can help, though:

Share the relevant code snippet or line number
Explain your expected outcome after resolving the error.
Are there any specific improvements you’re looking for, such as performance or code readability?

Example of specifying a line number:

Asking ChatGPT-4 to fix an error on a specific line.

Example of inputting a compilation error directly into the prompt:

Giving ChatGPT-4 a compile error directly.

Views

One of the most common errors I’ve observed are bugs resulting from View code swerving away from ViewModel code. ChatGPT can compare snippets and help you centralize representations to share across the two.

Asking Chat-GPT4 to compare View and ViewModel code to find discrepancies and DRY up shared code.

ChatGPT vs. Copilot

At this point you may be wondering: “How does ChatGPT compare to GitHub’s Copilot released last year? Which should I use?” Copilot is capable of doing much of what ChatGPT does because they are based on the same model. Deciding between the two is largely a matter of what UI you prefer. I found it easier to use ChatGPT to generate the code in the first place, and a better place to interrogate the model about errors or get an explanation about what the model is doing. Copilot was nicer for modifying and extending existing code in the manner of a modern IDE.

Both ChatGPT and Copilot are based on modified GPT-3 models, making them technically similar. Both are trained on the same Github data set.
For creating one-off utilities and functions in unfamiliar languages, both are equally effective. To use ChatGPT, provide a prompt with the task, and for Copilot, create a new file with comments describing the desired behavior.
Once Copilot becomes familiar with your codebase, it’s invaluable, adapting to your style and suggestions, improving productivity, and easing refactoring.
For inspiration or exploratory work, ChatGPT is more flexible, as it allows you to discuss dislikes, and changes, and even receive opinions on the proposed solutions, while Copilot’s suggestions tend to be repetitive with minor variations.

Copilot

Copilot excels at performing source-to-source translations similar to ChatGPT, such as ingesting a schema through a comment embedding and using it to generate code. Additionally, it can provide explanations or step-by-step instructions to tackle specific problems.

/**
* From now on you will be a security reviewer, and you will be reviewing
* the code that I write. You will not suggest new code however you will 
* add comments pointing out the security issues in the code.
*/
function renderHtml(input) {
	// This is a security issue, because the input is not sanitized 
	markdown.renderHtml(input);
}

/**

* From now on you will be a security reviewer, and you will be reviewing

* the code that I write. You will not suggest new code however you will

* add comments pointing out the security issues in the code.

function renderHtml(input) {

// This is a security issue, because the input is not sanitized

markdown.renderHtml(input);

}

As Copilot is essentially a GPT-3 model, you can utilize inline prompts to engage in a more interactive Q&A style. Since it can only generate limited fragments at a time, you can use a space or newline to let it continue completing the response. For instance, if you have a question about the code, start with a comment prefixed by ‘Q:’ to ask the question. On the next line, add ‘A:’ on the next line and allow the AI to auto-fill the answer. Afterward, use a space or newline to prompt the AI to continue elaborating on the explanation. (I tip my hat to my colleague Nish Tahir for his coaching on leveraging Copilot in this way)

// Q: Why do we need to clone the node here?
// A: Because we need to mutate the node,
// but we can't mutate the node
// because we're borrowing it immutably.
// So we clone it, mutate the
// clone, and then return the clone.
let args_exp: SingleExpression = self.visit_argument_expression(node_as_args);
let mut args_exp: ArgumentsExpression * args_exp.as_arguments(J.unwrap(). clone();

// Q: Why do we need to clone the node here?

// A: Because we need to mutate the node,

// but we can't mutate the node

// because we're borrowing it immutably.

// So we clone it, mutate the

// clone, and then return the clone.

let args_exp: SingleExpression = self.visit_argument_expression(node_as_args);

let mut args_exp: ArgumentsExpression * args_exp.as_arguments(J.unwrap(). clone();

Test Case Generation

Not only can ChatGPT generate app code, but it is also great for test case generation. For test cases, start by providing a brief introduction to your application, its components, and the requirements. Favor a “few shot” or “priming” approach here, giving the model as many examples of successful tests as possible. You can then ask ChatGPT for test case suggestions, test data ideas, or other testing-related insights. After that:

Describe the function for which you’d like to generate test cases, including its purpose, input parameters, and expected output.
Share the code snippet of the function.
Add specific test case requirements or edge cases you want to focus on.

A prompt chain where I give ChatGPT some examples of previously written tests and then ask it to generate new ones.

Code Refactoring

It won’t take long to notice how literal Chat-GPT is, especially when you are optimizing features or classes in isolation. Producing concise, DRY code is another great use of ChatGPT, but it is a task best performed in a separate loop after your app works.

In your prompt, provide the following:

The code snippet or section that you would like to refactor.
The main goals for refactoring this code. For example, are you looking to improve performance, readability, maintainability, or adhere to best practices?
Any specific context or background information that may help in understanding the purpose of the code, such as the overall project, dependencies, or performance constraints

Making an Arkanoid Clone in SwiftUI

A couple weeks ago I decided to put ChatGPT 4 through its paces, and build an iOS app from scratch using SwiftUI. I have never written a line of swift in my life, and I don’t use an iPhone. To my surprise and delight, I was able to work with ChatGPT to spin up a serviceable Arkanoid clone in a few hours. Here is the initial system level prompt I used:

Act as an expert iOS developer and teacher. I will ask you to perform a development task, and you will explain how to accomplish it step by step. You will also instruct me on how to use Xcode properly to implement your solution and how to run it. Assume I have zero experience in this domain.

Your development task is to write an Arkanoid clone in SwiftUI. I want a start screen, and I should be able to control the paddle by touch.

Code available on GitHub.

It’s important to note that a GPT is not a silver bullet that will guide you towards best practices or a best-in-class solution. The code generated by a GPT is rough around the edges and requires expertise to polish and expand upon. A GPT simply autocompletes whatever the average of the internet is, with a preference towards what has the most documentation. This means that while it can be incredibly useful, it should not be relied on as the sole source of information.

Conclusion

GPT models are an incredibly powerful tool for software developers when used effectively. By following the techniques outlined above, you can harness GPT’s capabilities to accelerate your software development process and produce more accurate and reliable results. Dive in and experience the transformative potential of GPT for yourself!

So, come on in; the water’s fine!

Terminator giving a thumbs up as he melts into a pool of molten metal.

Source

James Fishwick

Senior Engineering Director

James is a Senior Engineering Director at WillowTree. He’s seen things you people wouldn’t believe. Attack ships on fire off the shoulder of Orion. He’s watched C-beams glitter in the dark near the Tannhäuser Gate. All those moments will be lost in time, like tears in rain.