Cypress – Use Custom Chai Language Chainers

Have you ever found yourself in test automation projects burdened with repetitive assertions on elements, constantly needing to validate their styling or behavior upon every render? With Cypress, setting up a custom command to turn a few repetitive lines into one simple command is easy. For example, on the React TodoMVC, all uncompleted Todo items can be assumed to have a standard set of CSS styling. If we decided that we needed to check the styling of every Todo item in our automated test suite, we’d have to write out the full assertion (like below) each time we wanted to assert that the CSS styling of an uncompleted Todo was correct.

cy.get('[data-testid="todo-item-label"]')
  .each(($el) => {
      cy.wrap($el)
        .should('have.css', 'padding', '15px 15px 15px 60px')
        .and('have.css', 'text-decoration', 'none solid rgb(72, 72, 72)');
	// Abbreviated assertion, but not unreasonable to have more `.and()` lines
  });

Additionally, if our styling ever changed (say, we wanted to change the text-decoration color to rgb(25, 179, 159)or no longer wanted to check the padding), we would have to update each place we used those assertions.

To save some headaches, we can instead create a Cypress custom command to handle our assertions. This reduces the amount of code written per assertion as well as the number of lines of code to change if the assertions need updating. Here is an example of using a custom command in Cypress.

// Custom Command
Cypress.Commands.add('shouldBeATodoItem', { prevSubject: true }, (subject) => {
      cy.wrap(subject)
        .should('have.css', 'padding', '15px 15px 15px 60px')
        .and('have.css', 'text-decoration', 'none solid rgb(72, 72, 72)');
    });

// Using the Custom Command
cy.get('[data-testid="todo-item-label"]')
  .each(($el) => {
      cy.wrap($el)
        .shouldBeATodoItem();
  });

This custom command makes the assertion much more compact, but the above command (cy.shouldBeATodoItem()) doesn’t look very Cypress assertion-y to me. Cypress leverages Chai for most of the assertions built into Cypress, and I think I’d prefer to utilize that same format for my custom assertions. Luckily, Chai and Cypress make it fairly easy to create custom Chai language chainers and integrate them with cy.should().

In our support file(s) (cypress/support/{e2e|component}.{js|jsx|ts|tsx}), we can reference the chai global (since Cypress comes with chai). Doing so will allow us to create our language chainers and have chai (and Cypress) automatically pick up our custom language chainers.

chai.use((_chai, utils) => {
// Custom Chainer code here!
});

I’ve found that the easiest way to create a Custom Chai Language Chainer is to use the .addProperty method.

chai.use((_chai, utils) => {
utils.addProperty(_chai.Assertion.prototype, 'todoItem', function() {
        this.assert(
            this._obj.css('padding') === '15px 15px 15px 60px' &&
            this._obj.css('text-decoration') === 'none solid rgb(72, 72, 72)',
            'expected #{this} to be a Todo Item'
        )
    })
});

Breaking the above down:

  • utils.addProperty()
    • Used to add a property to the Chai namespace
  • _chai.Assertion.prototype
    • Used to specify that the property is to be on the chai.Assertion namespace 
  • todoItem
    • Name of the language chainer
  • function()
    • Important to use a function declaration and not an arrow function since our code uses scoped this
  • this.assert()
    • Chai assertion function, using the two-parameter signature, where the first parameter is the assertion (Boolean), the second parameter is the failure message if the positive assertion fails (to.be)
  • this._obj.css()
    • this._obj is the subject of the assertion. In our cases, this will be a JQuery object yielded from Cypress, so we can use JQuery’s .css() function to find CSS values.
  • #{this}
    • The #{} syntax is used to pass in variables to Chai. This is what gets that nice printout in Cypress, where it says expected <label> to be visible

In our test, instead of our custom command, we can use our custom Chai language chainer! 

// Custom Command
cy.get('[data-testid="todo-item-label"]')
  .each(($el) => {
      cy.wrap($el)
        .shouldBeATodoItem();
  });

// Custom Chain chainer
cy.get('[data-testid="todo-item-label"]')
  .each(($el) => { 
      cy.wrap($el)
        .should('be.a.todoItem');
  });

Unfortunately, tests don’t always pass. It’s easiest to troubleshoot failing tests when the errors from the failing tests are specific to the issue, and our current implementation of the language chainer does not give us a clear picture of why our test would fail.

If an element doesn’t meet our assertion for a Todo Item, we don’t know why the element isn’t meeting that assertion. To get that data, the simplest way is to make a series of soft assertions. The object yielded to this._obj is static (at least per-iteration through the assertion) and can be used synchronously, so we can store our soft assertions as booleans.

// Adding Soft Assertions
utils.addProperty(_chai.Assertion.prototype, 'todoItem', function() {
        const isPaddingCorrect = this._obj.css('padding') === '15px 15px 15px 60px'
        const isTextDecorationCorrect = this._obj.css('text-decoration') === 'none solid rgb(72, 72, 72)'
        this.assert(
            isPaddingCorrect && isTextDecorationCorrect,
            'expected #{this} to be a Todo Item'
        );
    });

But that hasn’t changed our error messages yet, to do that we’ll need to change the string in the second parameter. We can manually construct our error message by checking if the soft assertion is false and, if so, adding the failure to our custom error message.

utils.addProperty(_chai.Assertion.prototype, 'todoItem', function() {
        const isPaddingCorrect = this._obj.css('padding') === '15px 15px 15px 61px'
        const isTextDecorationCorrect = this._obj.css('text-decoration') === 'none solid rgb(72, 72, 72)'
        
		let errorString = 'expected #{this} to be a Todo Item'
        if (!isPaddingCorrect) { errorString += `\n\t expected padding to be 15px 15px 15px 61px, but found ${this._obj.css('padding')}` }
        if (!isTextDecorationCorrection) { errorString += `\n\t expected text-decoration to be 'none solid rgb(72, 72, 72)', but found ${this._obj.css('text-decoration')}` }
        
		this.assert(
            isPaddingCorrect && isTextDecorationCorrect,
            errorString
        );
    });

In this example, we’ve changed the expected padding value to be 15px 15px 15px 61px, and we can see the error message displayed:

The changes accomplish our goal of being able to use a custom Chai language chainer and have an informative error message on what failed the assertion. But we’re doing repetitive tasks (iterating through some boolean values) and hardcoded values twice. We can improve our code by reusing the common values to run our soft assertions and write error messages.

// Step 1: Create an expected data object
const expected = {
            padding: '15px 15px 15px 60px',
            'text-decoration': 'none solid rgb(72, 72, 72)'
        };

// Step 2: Create a combined Soft Assertion value, using Array.every()
const hasCorrectProperties = Object.entries(expected).every(([key, value]) => this._obj.css(key) === value)

// Step 3: Use array.map() to generate our error string
this.assert(
            hasCorrectProperties,
            "expected #{this} to be a Todo Item" + Object.entries(expected).map(([key, value]) => { if (this._obj.css(key) !== value) { return `\nexpected #{this} to have ${key}: \n\t${value}, \nbut found:\n\t${this._obj.css(key)}`} else { return '' }}).join(''),
            "expected #{this} to not be a Todo Item"
        );

Breaking the above down:

The important thing to remember is that the key values for the expected data object must match the CSS properties, as we use that key value to search for the CSS property. (If we were to use the more JavaScript-like textDecoration instead of ’text-decoration’, this would not work.)

A definite improvement! But if we wanted to create several language chainers easily, we’d need to copy over the same few lines of code each time. We can abstract this to a few helper functions and simplify our setup within the utils.addProperty().

/**
 * @param {JQuery<HTMLElement>} ctx -> context, the element passed into the assertion
 * @param {object} expected -> expected data object; key is the css property, value is the expected value
 * @param {string} elementName -> name of the element being tested
 * @returns boolean, string
 */
const assertChainer = (ctx, expected, elementName) => {
    const hasCorrectProperties = Object.entries(expected).every(([key, value]) => ctx.css(key) === value);
    let positiveErrorString = `expected #{this} to be a ${elementName}\n`;
    Object.entries(expected).forEach(([key, value]) => { 
        if (ctx.css(key) !== value) { 
            positiveErrorString += `\nexpected #{this} to have ${key}: \n\t${value}, \nbut found:\n\t${ctx.css(key)}\n`;
        }
    });
    return [hasCorrectProperties, positiveErrorString];
}

// Use
chai.use((_chai, utils) => {
    utils.addProperty(_chai.Assertion.prototype, 'todoItem', function() {
        const expected = {
            padding: '15px 15px 15px 60px',
            'text-decoration': 'none solid rgb(72, 72, 72)'
        }
        this.assert(
            ...assertChainer(this._obj, expected, 'Todo Item')
        );
    })

    utils.addProperty(_chai.Assertion.prototype, 'completedTodoItem', function() {
        const expected = {
            padding: '15px 15px 15px 60px',
            'text-decoration': 'line-through solid rgb(148, 148, 148)', 
        }

        this.assert(
            ...assertChainer(this._obj, expected, 'Completed Todo Item')
        );
    });
})

(Curious about those three dots preceding assertChainer above? It uses the spread operator to turn assertChainer’s returned array into separate variables.)

Caveat:

I did not use the third parameter in these examples when constructing the custom Chai language chainers. This prevents the language chainer from accepting a negative assertion. When attempting to assert via should(‘not.be.a.todoItem’), the following assertion error is thrown, and the assertion is not attempted.

If you would like to add support for a negative error message, simply provide the error message as the third parameter in your assert() function.

Link to repo

Decoding Diffusion: A Two-Week Spike Exploring AI Image Generation

Okay, here it comes…the big moment! Everything I’d crammed in my brain from the past several weeks was about to have its spotlight moment. My partner requested that I create a logo for his photography business. I mean, sure, I could sketch something with crayons or snag a snazzy premade logo from Canva, but why do that when Midjourney and Stable Diffusion could do it for me? It’s just a logo – how tricky can it be?

I had a specific picture of his that I wanted to turn into a simple line drawing, but I was also open to other suggestions.

Original Image

I tried Midjourney first. I uploaded an image and used the /describe command to generate descriptions of my image in “Midjourney syntax”, then I then used the descriptions to generate some images. 

Images Generated from Midjourney’s /describe command

***Side note: I find it interesting that nothing about the original image indicates “hip hop” but Midjourney classifies it as “hip hop aesthetic” or “hip hop flair”

The results weren’t terrible but not really what I wanted. The next step was to use the /blend command to merge my original image with one of the generated images to get it back closer to the original. After several iterations, I finally started altering the prompt to introduce the concept of line drawing illustrations, but Midjourney’s output wasn’t quite what I wanted. 

Blending a generated image with the original image and then introducing line drawing and silhouette in the prompt

I jumped over to Leonardo.ai which is built on Stable Diffusion. I upload my original image and pick a community-trained model close to the style I want for my logo. This time, I tried the image-to-image generation specifying line drawing illustrations in the prompt. I go through several iterations here too, similar to Midjourney. The results were the same – still not quite what I wanted.

Images generated by Leonardo.ai

Why was this so hard? Was it because I had a very specific image I wanted to generate? Or maybe I wasn’t structuring my prompts correctly. How do these image-generation tools actually work anyway?

A High-Level Overview of Image Generation Models

In essence, an image generator is a type of artificial intelligence that creates images from a text prompt or an image input. These models are trained on large datasets of image-text pairs allowing them to learn the relationships between text descriptions and visual concepts. 

Although AI art dates back to the 1960s, recent advancements in machine learning have sparked a surge of image generation platforms, notably in the past 2 years. 

Stable Diffusion was first announced in August 2022, and it was made available to the public in October 2022. The model is trained on a dataset of 1.56 trillion images from a variety of sources, including the internet, public domain collections, and social media.

Midjourney was founded in August 2021 by David Holz. Holz started working on Midjourney as a self-funded research project, first testing the raw image-generation technology in September 2021. The Midjourney image-generating software went into limited beta on March 21, 2022, and open beta on July 12, 2022

There were several other text-to-image diffusion models released around this time too. DALL-E was released in January 2021, and Its successor, DALL-E 2 was released in April 2022.

There are two kinds of image generation models.

Generative Adversarial Networks, or GANs, are made up of two neural networks, a generator and a discriminator. I like to imagine this as two bots playing a game. The generator’s job is to create images. At first, it’s really bad at it, but it wants to get better. The discriminator is trained in spotting real and fake images. When the generator presents its work, the discriminator judges it and if it says it looks fake, it penalizes the generator or takes away a point. This feedback loop continues until the generator gets better at creating images that look real. 

One significant downside of GANs is the stability issue during training. They are prone to problems such as mode collapse, where the generator fails to capture the full range of patterns in the training data, resulting in limited diversity in the generated images. Hence most image-generation platforms use Diffusion.

Image from https://theailearner.com/2019/09/18/an-introduction-to-generative-adversarial-networks-gans/

Diffusion models are what powers popular image generators like Stable Diffusion and Midjourney. For this blog, we are going to focus mostly on diffusion models and how they generate images.

Diffusion models are trained by adding “noise” to an image and then removing “noise” to recreate the original image. Visual noise is a pattern of random dots, similar to television static or a very grainy picture. Let’s imagine a television screen that isn’t getting a clear picture. It has a lot of grain or static. We need to make adjustments to the antenna to get a better signal which will then produce a clearer picture. The process of adjusting the antenna to get a clear picture is similar to how a diffusion model removes noise. In its training, we introduce a clear picture and then obscure it heavily with noise. The model removes the noise in steps recreating the original picture.

Image from https://cvpr2022-tutorial-diffusion-models.github.io/

Text to Image

Before we can generate images, our prompt (“a beautiful beach sunset”) goes through a text encoder where it is broken down and transformed into a numeric representation. The text encoder is a language model, specifically a CLIP or Contrastive Language Image Pre-Training. These models are trained on images and their text descriptions. Each word is broken down into its own token and given a unique ID where the model can then try to understand the relationship between the tokens. I like to think of this just like how humans refer to a dictionary to learn the meaning of a word, but we also need to learn the meaning of the word within the context of the sentence it was used in.

How GPT-3 breaks down words into tokens and organizes them in an array or Tensor.

Image from OpenAI official documentation https://platform.openai.com/tokenizer

Next, we feed our tokenized prompt to the Image Generator. This is where the “diffusion” happens.

Using our tokenized prompt as a guide, it begins removing noise; carving out an image step by step. Eventually, it generates a clear image. 

Image from http://jalammar.github.io/illustrated-stable-diffusion/

Image to Image

When we have an image as part of our prompt, our image passes through a feature extractor. Our original image is then infused with noise. The diffusion model will de-noise the image combined with the features extracted. The output will result in variations of the original image.

I’m A Front End Dev! What Am I Doing Playing with Machine Learning and Image Generation?

Let’s rewind to a few weeks ago when a group of us here at WillowTree were given the task of exploring Image Generation tools. How might designers and illustrators use these tools to quickly generate illustration libraries for clients? We had two weeks to run our experiments. Our team consisted of two designers, Karolina Whitmore and Jenny Xie, two engineers, Tyler Baumler and myself, one data scientist, Cristiane de Paula Oliveira, and Jenn Upton, our project manager. 

We knew that we needed to start with Stable Diffusion for several reasons. First, you can install it locally and keep your images within your machine. This keeps it off public channels where it can be used by anyone who stumbles upon your generated images. Second, you can train your own model with your own images and illustrations thereby minimizing the ethical issues that come up with infringement of copyrighted work. Third, it’s open source and doesn’t require a membership or subscription.

We also wanted to test out other platforms, like Midjourney, Leonardo.ai, Runway, and Adobe Firefly. We were looking for ease of use and how quickly we could get the image generator to consistently generate images aligned with a specific illustration style.

The first hurdle was installing and running Stable Diffusion locally. This required some knowledge of the command line and git. The installation process wasn’t too bad. For me, it only required updating Python, cloning the repo, and typing a magical incantation in the terminal. What greeted me in http://127.0.0.1:7860/ was a somewhat intimidating interface.

Stable Diffusion Web UI, Automatic1111

There were so many settings to toggle and terminology that was new and confusing. Nevertheless, we started tinkering and experimenting with generating anything. I’m not sure how everyone else did, but I immediately realized I needed to learn how to write effective prompts for Stable Diffusion. 

Our initial run at training a model also yielded terrible results, most likely because we only had four images in our datasets. It wasn’t until after reading some forums and watching some videos that we decided to use an illustration pack that Jenn had obtained for us. This illustration pack consisted of 50 images, with a consistent style and color palette. 

One of our first experiments with model training

Images from our Blue Ink illustration pack

Before I continue, I have to confess my naivety about hardware requirements. I have an M1 Mac and I put too much trust in it. Training a model or an embedding through Automatic1111 was a very heavy lift for my machine. It took a whole day and a half for it to train, and even then, I’m not even sure if I trained it correctly. Thankfully, we can train our models through Google Colab, a hosted Jupyter Notebook service where you can write and execute Python through the browser. Training a model there took about an hour at most. 

Now the fun part! Generating images using our model! Our results were… interesting.

The results seemed better but still presented with some distortions. It was clear that whatever we ended up generating through Stable Diffusion would still require manual editing before it was usable for any client project.

Midjourney

Karolina was more familiar with Midjourney and ran similar experiments. Her results were better.

Karolina’s Midjourney Couch vs. My Stable Diffusion Couch

Those of us who jumped over to Midjourney to give it a try appreciated its easier setup. Although you can’t train the model, you can however upload a sample image and iterate over it countless times until you get your desired output. In most cases, oddities and distortions were still present in the generated images which led us to the conclusion that we would still need to do final edits on another platform.

Converting our images to vector images

Karolina took our images a step further and converted them into vectors using vectorizer.ai. As an engineer, I didn’t know much about editing vectors aside from tweaking a few parameters in an SVG file. I have very little experience using Adobe Illustrator, but I do know it can be a pain having to edit several anchor points. Vectorizer.ai produced SVGs with minimal rework. 

Midjourney couch (right) and Stable Diffusion (left) couch in the style of Blue Ink in Adobe Illustrator

Takeaways:

At the end of our two weeks, it seemed that we barely scratched the surface of AI Image Generators. This is an emerging field and it is rapidly changing. There are a multitude of platforms out there to try, some requiring a subscription and some are free. Here are the things our team collectively agreed on:

  • Local installation of Stable Diffusion has the most flexibility when it comes to model training and generating images. However, it has a hardware requirement and a very steep learning curve for using its interface. 
  • There is a substantial time investment required to learn how to train models, checkpoints, LORAs, embeddings, and ControlNets. 
  • Midjourney is a bit easier. It does require a subscription and a specific prompt structure which you’ll get the hang of as you continue to use it. The images generated can be reproduced, displayed, and modified by Midjourney itself to improve its service. 
  • Copyright laws and ownership of generated images are currently a little hazy. There are no laws protecting your images from being used by anyone.
  • Most images will need further adjustments. Vectorizer.ai worked well!

There’s still so much to learn!

Our two-week SPIKE flew by. There was still so much to learn out there. Many of us who have been dabbling in AI image generation have learned little things here and there, and developed techniques and processes. We all carry a small piece of the puzzle.

So, was I successful in generating a logo using AI? Well, it depends on how you define success. In the end, I ended up hand drawing the logo on an iPad using Procreate, a digital art program, but drawing inspiration from images generated by Midjourney and Leonardo.ai. I suppose yes, It was a success! I used AI for ideation and perhaps that’s currently the most common use case for artists, illustrators, and designers for now. As more people continue to use these platforms and the model continues to learn from its users, we’ll likely see fluctuations in its effectiveness and results. 

The finished product was inspired by Midjourney and Leonardo.ai’s generated images

References:

“The Illustrated Stable Diffusion” by Jay Alammar, Nov 2022 

“Ethical Ways to Create and Use AI Art: Do They Exist? [Part I]” by Gypsy Thornton, Mar 2023

“Ethical Ways to Create and Use AI ARt: Do They Exist? [Part II]” by Gypsy Thornton, Mar 2023

“Denoising Diffusion-based Generative Modeling: Foundations and Applications”, CVPR 2022 Tutorial

“An Introduction to Generative Adversarial Networks” by Kang & Atul, Sep 2019

An Introduction to Roku Test Automation: Smoothing the Way for Brightscript Development

Roku’s digital media player has revolutionized the way we consume entertainment. As a developer, Roku’s unique development landscape, which uses Brightscript—a proprietary, Roku-specific language—presents specific opportunities and challenges alike.

Rooibos: Reinventing Roku-based Unit Testing

The Rooibos testing framework is a vital unit testing tool designed specifically for Brightscript, the proprietary development language used by Roku. Brightscript, being unique to Roku, has its own set of challenges, and Rooibos helps developers address these efficiently to ensure the robustness of their applications.

One of the key benefits of using Rooibos is its integration and effectiveness. It’s usually installed as a Node.js package and integrates well with task runners like Gulp. Such seamless integration fosters an effective continuous integration/continuous deployment (CI/CD) pipeline. Rooibos supports different types of unit tests, traditional “unit” tests and “focused” unit tests, to provide more precise control over the testing process.

Rooibos does not require transpiling source files to JavaScript, like many modern testing frameworks. Instead, it runs directly on Brightscript. This compatibility not only saves time but also provides a testing environment that accurately captures the unique challenges presented by the Roku platform.

Rooibos also provides code coverage analysis. With the right tools developers can generate coverage reports with Rooibos, pinpointing which parts of the codebase have been tested and which may need further attention.

Furthermore, Rooibos’ test syntax is similar to other popular testing frameworks such as Mocha or Jasmine. This can expedite the learning curve for developers already familiar with those frameworks. Rooibos also supports essential core features such as before/after hooks, and inline or standalone setup/teardown methods, making it a flexible and comprehensive solution.

(Source: https://github.com/georgejecook/rooibos)

Navigating E2E Testing with Roku’s Automated Channel Testing

Roku’s Automated Channel Testing framework serves a crucial role in facilitating End-to-End (E2E) testing for developers creating applications on the Roku platform. This invaluable tool verifies that all parts of the application interact seamlessly with each other, validating the interconnected components from start to finish to ensure a flawless user experience.

One of the key benefits of Roku’s Automated Channel Testing is its user-friendly approach. It provides developers with the opportunity to write their E2E tests using familiar and widely adopted languages like JavaScript. This familiarity drastically reduces the learning curve and enables developers to create comprehensive E2E tests efficiently.

Furthermore, this E2E testing framework fully supports Roku’s comprehensive portfolio of APIs – both Graphic and Video Node APIs. This breadth of support means developers can create tests simulating the complete user journey, from launching the application and navigating through the user interface to streaming video content.

This ability to conduct E2E tests goes a long way in mitigating risks before an app is launched or updated. The early detection of issues saves time, effort, and resources that could be consumed if a problem arises post-deployment. Hence, stakeholders have the assurance of a robust application that delivers a flawless user experience from the get-go.

Mastering Brightscript with VSCode Tools

VSCode, a widely popular integrated development environment (IDE), offers a slew of tools tailored for Brightscript development. Important features like syntax highlighting make the code more readable, while integrated debugging with breakpoints allows developers to pause and inspect their code, a vital part of adhering to software testing best practices.

(Source: https://marketplace.visualstudio.com/items?itemName=celsoaf.brightscript)

Keeping it Compact with Size Testing Scripts

Size does matter when it comes to Roku packages—exceeding the 4MB limit could lead to app rejection. Incorporating a size testing script into your automation suite ensures that you’re creating efficient, streamlined code that aligns with Roku’s requirements. Any scripting language you are familiar with can be used to perform this test. I chose a simple Node.JS script to perform this test since it’s already baked in, given the above tools.

Ensuring Backward Compatibility

A common best practice in software testing involves checking backward compatibility. In the Roku realm, this means that your applications should run smoothly not only on the latest devices but also on Roku’s oldest supported models. Older models often have less memory, impacting the performance of heavy applications, or use different graphics APIs, which could affect your application’s UI/UX.

(Source: https://developer.roku.com/docs/specs/hardware.md)

In practice, Roku Test Automation involves adopting and perfecting the use of an array of tools – from community-driven unit test frameworks like Rooibos to comprehensive IDEs like VSCode. By understanding and effectively utilizing these tools in tandem with software testing best practices, Roku developers can ensure efficient, robust, and user-friendly applications ready for seamless deployment.

Epic showdown: Two teams built the same product with and without AI

Summary

Engineers at WillowTree have been using AI tools like GitHub Copilot and Chat-GPT since they first arrived on the scene. As these tools become more advanced and prevalent, we wanted to explore how exactly they add value to our teams.

To learn more about the effects of AI tools on engineering velocity and product results, WillowTree organized a case study in which two teams built the same product over the course of six days. One team was encouraged to use AI tools such as GitHub Copilot and Chat-GPT in their development process, while the other team didn’t use AI at all.

At the end of the case study, the AI team had completed 11 user stories, while the no-AI team completed 9, indicating that AI tools can help to improve productivity. We found a possible trade-off between velocity and product quality, as the AI team had a few more minor defects in their final product. However, it’s important to note that various confounding variables could have affected the results, discussed in more detail below. 

In terms of AI use cases, engineers reported that GitHub Copilot was the most useful tool for daily tasks, significantly speeding up velocity by autocompleting tests and tedious, boilerplate code. Chat-GPT was also helpful, particularly when used to learn about an established technology. However, Chat-GPT hallucinated answers and lacked up-to-date knowledge. For more accurate responses to complex questions, engineers preferred using Phind, an AI-powered developer search engine.

As AI technologies become a more and more prominent part of the developer experience and everyday life, it’s important to try to gain a deeper understanding of how they can affect our workstyle and product outcomes. We are all hearing the hype about tools like GitHub Copilot and Chat-GPT, and WillowTree teams are already using them on most of our projects. But how exactly do they add value to a team, and to what degree? In an effort to measure this, WillowTree recently recruited a few unallocated team members for an “experiment” that would help us learn more about the current state of AI tools, how to best leverage them, and their effects on developer experience and velocity. In this article, I’ve put together some of the most important learnings and takeaways from the experience. Although this case study was a first iteration and our results aren’t quite ready to be carved in stone, I believe we gathered some pretty valuable and interesting insights.

Disclaimer: I was a part of the no-robots team, and in true no-robots fashion, this article is entirely human-generated. (Though I hypothesize that having Chat-GPT write it would have been significantly faster.)

The case study setup

A few weeks ago, two small teams of WillowTree engineers embarked on a mission to answer the question: What is the impact of using AI tools for engineering? Each team was composed of two developers and a test engineer. Led by the same technical requirements manager, they worked on rebuilding an existing weather app in React Native, with a focus on iOS. The teams each had their own Jira Kanban board and backlog, with identical tickets arranged in the same order. The big difference: one team was encouraged to leverage AI tools during their development and testing process, while the other was not allowed to use any AI at all. The robots team was called Team Skynet1, and the no-robots team was Team Butlerian2 (they will be referred to as such from this point on).

Team SkynetTeam Butlerian

Source: Popular Mechanics

Source: Dune Wiki

The teams had six days to work through the backlog and get as much done as they could. Each team member only had 5 hours per day to work on the project. The timing constraint was intended to make sure the teams worked an equal amount of hours while accounting for other responsibilities they had while unallocated. To keep variables under control as much as possible, the teams also agreed on the same Expo3app setup, and on the same testing framework and test coverage goal. As far as individual package choices were concerned, the teams made their own decisions independently. Repos were set up by each team the day before development began. 

With this setup, both teams would be as equal as possible to start and avoid (dis)advantages on either side. Still, it’s worth noting that this case study was not meant to be a rigorous scientific experiment. Rather, it was a way to start investigating the use of AI in engineering at WillowTree and gather some soft findings. In the future, additional studies can incorporate feedback gathered from this round and help validate our results.

And… go!

Teams began development on the same day and had daily standups with the requirements manager to share progress and ask questions. There were also team-specific and shared Slack channels where we could work out any uncertainties that came up. And there were quite a few of those. As development progressed, we had to work through questions about Figma assets, API endpoints, test framework compatibility, time tracking, and ticket order. While we had no norming or grooming sessions as we would on a real project, the teams collaborated to agree on paths forward. As we completed tickets, we noted the amount of time spent on each one. Both teams also wrote about their experience in “dev journals.”

To give you an idea of the work we completed as part of the weather app rebuild, I’ll list some of the main features:

  • Bottom-tabs navigator
  • Home screen for a user with no weather locations saved
  • Location screen where the user could search for and add locations
  • Locations list with data retrieved from local storage
  • The ability to turn on “current location”
  • Loading and error states
  • Most importantly, dark mode 😎

Rebuilding the weather app was a pretty fun challenge and required a good variety of technologies for which we could test the powers of AI (animations, navigation, local storage, theming, SVGs, APIs, testing, and other good stuff). But before we knew it, it was the end of day six and time to draw some conclusions. Where did both teams stand?

Just the facts

Team Butlerian had 9 completed stories, with 1 ticket remaining in Test and 1 in Progress.

Team Skynet had 11 completed stories, with 1 ticket in Progress. 

Initial velocity summary

Team ButlerianTeam Skynet
Completed Stories911
Velocity *1.501.83

*Velocity is calculated as the number of completed stories divided by the number of days (6)

If we do some math…

(1.83 – 1.5) / 1.5 * 100 = 22

A 22% increase in velocity with the use of AI tools. So… the AI team won, right? 

Well, yes and no. The answer is a lot more nuanced. 

Analyzing the results

The next couple of days after development was over, everyone who participated in the case study spent some time reflecting on their experience in writing and discussion meetings.

Source: Giflytics on Giphy

After some consideration, we knew we couldn’t simply divide the numbers and arrive at a percentage increase in velocity from AI. There were a lot of variables at play and the qualitative data that was gathered from team members needed to be considered. All this feedback led us to want to investigate code quality. Although AI seemed to increase velocity, did it have any effects on the quality of the codebase? We approached this comparison from a few different angles:

  1. What was the code coverage for unit tests, and were they all passing?
  2. What was the code coverage for UI tests, and were they all passing?
  3. How reusable and maintainable was the code? (This one could be a bit subjective.)
  4. How many UI bugs were found in the final product, and what was their severity?

Each team had slightly different test coverage and ways of keeping the code maintainable and reusable, but overall the codebases were similar in these respects. However, we found that Team Skynet had a few more minor bugs from a user experience standpoint. There could be various factors at play here, such as rushing towards the end because of the time constraint, willingness to experiment with new approaches with the help of AI, and simply the possibility of more bugs being introduced with the completion of extra stories by Team Skynet. Although the bugs are not necessarily attributable to AI, developers may want to consider the potential tradeoff between increased velocity and the need for additional troubleshooting to maintain quality. 

After assessing the bugs and estimating the time it would take to fix them, the adjusted velocities were 1.406 for Team Butlerian and 1.506 for Team Skynet.

Comparison summary

Team ButlerianTeam Skynet
Unit Test Coverage83%93%
UI Test Coverage85%77%
UI Bugs🪲🪲🪲🪲🪲
Adjusted Velocity1.4061.506

Where did AI come in handy? And where not so much?

Another important takeaway we hoped to get from this case study was determining where AI is most useful, and which tools might be fit for different types of tasks. The dev diaries of Team Skynet provided the best insights here. Throughout the project, Team Skynet documented their experiences using AI for different tasks and summarized their findings based on the task’s risk and value. These are the tools they used:

GitHub CopilotChat-GPTPhind
A code completion AI tool integrated with development environments such as VS CodeA conversational large language model (LLM) created by OpenAIAn AI search engine for developers that shows where answers came from (ex. Stack Overflow posts)

GitHub Copilot

According to the team, GitHub Copilot was the most useful. It had low risk and provided high value. Although not all suggestions were accurate, engineers still found that it sped up their development process significantly. They were impressed with Copilot’s ability to suggest multiple lines of code, sometimes before they even knew they would need it. Although Copilot wouldn’t be able to code a whole app on its own, they found it immensely helpful for getting through boilerplate and monotonous tasks.

For testing, Team Skynet’s test engineer reported GitHub Copilot being able to take over test-writing after just a few examples. Using just the name for a test, Copilot was able to autocomplete the full test code pretty much on its own. While in many cases no corrections were needed, there were also times when Copilot was just a little bit off, just enough to fool the team and cause confusion later on.

Chat-GPT

The team classified Chat-GPT as having medium-high risk, and medium-low value. They were often led astray by the tool, which tended to “hallucinate” answers when it didn’t have the knowledge needed to answer a question. They noticed that Chat-GPT would also overcomplicate answers sometimes, for example telling developers they need to install multiple dependency packages, when they really only needed a few. One team member mentioned wanting to double-check the LLM’s answers against their own Google searches afterward. 

The LLM’s knowledge cutoff date of September 2021 was also a significant barrier when the team wanted to learn about new technologies. For example, developers wanted to integrate Expo Router4 for their navigation setup, but Chat-GPT had no knowledge of the tool and was unable to help. Overall, the developers felt that Chat-GPT would not be their go-to source of information for technologies that the model wasn’t trained on. They mentioned still going to docs as their primary source.

Despite the downsides, Chat-GPT proved very helpful for introducing engineers to well-established tools they were unfamiliar with. Team Skynet’s test engineer felt more empowered to help troubleshoot and solve issues that developers came across despite having no prior knowledge of React Native. Having Chat-GPT quickly provide background knowledge and potential solutions helped him to feel more comfortable participating in discussions. Without the tool, he believed he’d have spent much more time researching solutions. However, he also experienced trouble with Chat-GPT when it didn’t have enough information on the relatively new testing framework Deox, and it hallucinated fake solutions.

Phind

Compared to Chat-GPT, the team felt that Phind was more transparent and reliable. They classified it as medium risk and medium value, better than Chat-GPT for providing answers to their questions. Because of this, they ended up using it more often throughout the project when researching answers to complex problems.

AI tools risk/value chart

ToolRiskValue
Github CopilotLowHigh
PhindMediumMedium
Chat-GPTMedium-HighMedium-Low

Some confounding variables

Having shared the results, it’s important to mention some variables that may have influenced them. Due to the nature of the case study, it wasn’t possible to control for all external factors, although some of them could potentially be mitigated in future iterations.

  • Team-member experience – Although teams were fairly matched, each person had different levels of experience with React Native and other tools used in rebuilding the weather app. Seniority levels also differed, especially between the test engineers.
  • Package choices – Team Skynet mentioned feeling empowered to try new packages as part of their development process because of these AI tools. This, as well as personal preferences, led the two teams to use different packages for implementing features like navigation, theming, and local storage. We therefore came across different challenges in implementing these features.
  • Teamwork style – Team Skynet reported mobbing throughout most of the project, while Team Butlerian engineers worked independently, coming together to troubleshoot or clear up uncertainties. This could have also affected the amount of time each team spent per ticket, and the overall amount of work completed.
  • Level of familiarity with AI tools – An engineer on Team Skynet mentioned that it would have helped to be more familiar with prompt engineering in order to really make the most out of these tools. Engineers had no prior training in this area.
  • Inter-team interactions – Teams shared the same standup, and were therefore able to see each other’s progress. This, combined with the time tracking constraint, may have affected the speed and quality of team members’ work.

Possible improvements for case study 2.0 

During our retro, the teams also discussed ideas that could potentially make a future iteration of the case study more closely resemble a client project.

  • Allocating additional time to flesh out acceptance criteria for each ticket. This could have minimized the amount of uncertainty that came up as development work progressed.
  • Having a repo set up beforehand, which teams could clone and start working from. Setting up a repo could do a couple of things: 1) encourage teams to use the same setup and packages (making results more comparable), and 2) add more context from which tools like GitHub Copilot pull information for providing suggestions.
  • If possible, extend the timeline for the project. This project felt very short with only six days of development. Over a longer period of time, more solid conclusions could be drawn.
  • Including additional disciplines. During the project, teams had numerous questions about Figma assets and the expected user experience once we dug into the details of more complex tickets. A designer would have been able to really help here, where our AI tools couldn’t. There is also the potential for seeing how AI can help other disciplines, including designers and product architects.

TLDR: Overall takeaways

So that was a lot. What should you really take away from this? What did we learn from the AI study this time around?

The team that used AI tools (Github Copilot, Chat-GPT, and Phind) did see an increase in velocity. They found that Github Copilot was the most useful tool for their daily tasks, significantly speeding up velocity by autocompleting tests and tedious, boilerplate code. They also found Chat-GPT helpful, particularly when using it to learn about an established technology. However, Chat-GPT hallucinated answers and lacked up-to-date knowledge. For more accurate responses to complex questions, the team preferred the developer search engine Phind. And even more so, they referred to official docs as the ultimate source of truth. 

One quote from the Team Skynet dev journal stood out to me as a good summary of our learnings: “AI is helpful for getting started on something, but not helpful for finishing. To fully deliver a high-quality end product/result, you need someone with expertise” (Summit Patel). AI can speed up the development process, but it still takes a developer to find bugs in generated code, architect complex solutions, and stay on top of new developments. AI may become able to do these things in the future, but for now, using it still poses tradeoffs. These tools should be used with an open mind, but a dose of skepticism. Nevertheless, it’s an exciting horizon, and with the rapidly evolving state of AI, we look forward to following up with a second run at this case study in the future. Thanks for reading!

Footnotes

  1. In the Terminator movie, Skynet is a general super-AI that causes the downfall of humanity ↩︎
  2. Butlerian – In reference to Dune: The Butlerian Jihad, a novel by Brian Herbert and Kevin J. Anderson, the Butlerian Jihad is a crusade against “thinking machines” to free humans ↩︎
  3. Expo – a platform that provides tools and services to make React Native development quicker and simpler ↩︎
  4. Expo Router – a file-based router for React Native applications ↩︎

Refactoring Our Team’s Approach to Android Automated Testing

The Goal: Code Nirvana

Imagine a world where you can release your app without lifting a finger to test it. Your code flows from your mind to your fingertips on the keyboard, then to your version control system, which triggers a set of tests, ensuring your vision is achieved without breaking any existing functionality. Some moments later, after the compilers have finished their magic, prompted by your continuous integration pipeline, a shiny, new package appears: your team’s latest creation. Something akin to “code nirvana” — the state of enlightenment, as a team of engineers, in which newly added features produced no bugs, suffering, or regressions. Sounds nice, doesn’t it?

I have been working on a team that longs to thrive in this enlightened state of code nirvana. We’re not there yet — our tests are long, bulky, and sometimes we’re not even sure what they’re supposed to test. Our pipelines — especially the emulators we try to run on them — are flaky. We use unreliable third-party systems in our tests, which leads to failures that leave us uncertain if we broke something, or if something else was broken, and we we’re merely innocent bystanders. So how do we arrive at code nirvana from this land of despair? 

Seeing the forest from the trees

For us, the first step was to take a step back and examine the current state of our tests. We noticed a few things:

A flaky emulator in our CI pipeline

This is a common issue for Android app development teams. The Android emulators run well on our hardware, but when we try to run them in the cloud, they are slow or unusable — when we can even get them to start at all. The reason is that the agents we used to run the tests don’t support full virtualization. We were trying to run our tests on virtual machines, which, in turn, don’t support full virtualization themselves. Android emulators are virtual machines, after all. 

Reliance on third-party sandbox environments

This one may be less common, but it was a real problem for us. When our tests were originally written, they were intended to be a full end-to-end test suite, triggering actual calls to the APIs we relied on, and ensuring that the integration worked properly. The problem became the reliability of those third-party systems. We were relying on real data, which was subject to change on a whim, or systems that were the sandbox environment of our partners, sometimes meaning they would be unavailable for days at a time. 

Broken tests

This one is likely another common issue and reminds me of a great article by Martin Fowler on the concept of “test cancer.” Fowler describes the problem like this: “Sometimes the tests are excluded from the build scripts, and haven’t been run in months. Sometimes the ‘tests’ are run, but a good proportion of them are commented out. Either way, our precious tests are afflicted with a nasty cancer that is time-consuming and frustrating to eradicate.” Our code was fraught with test cancer — most of our automated tests still technically ran, but their results were ignored entirely.

Unclear purpose of tests

One factor that led to our blissful ignorance of test results was that, often, we weren’t sure what they were supposed to be testing. It’s easy to ignore the result of a test if that test doesn’t provide value to your team or your client. So the tests were there, and they were running, but they weren’t telling us what we really needed to know — could our users use our app how they needed to?

Achieving enlightenment

After taking stock, it was clear that we needed to make some changes. So, like any good engineering team, we made a plan. We evaluated our problems and sought solutions that would make our tests work for us to achieve code nirvana, instead of keeping us in the land of despair.

The first problem was that we had a low confidence level in our automated tests, so manual testing was necessary for a release. In the state of code nirvana, manual testing is at a minimum. Achieving a higher level of confidence would also mean that if a test was broken, we knew we had a problem, and something needed to be fixed. Additionally, we wanted our tests to be lightning-fast, so we could immediately know whether there was an issue. Lastly, we wanted the process of adding new tests to be simple. We didn’t want to worry about whether a feature was already tested because we’d know, and we wanted to be able to add tests for untested features with a low level of effort and complexity, meaning our tests would be scalable.

To get there, we decided a few changes were in order. 

Remove tests that didn’t prove valuable

This part was relatively easy, as it mostly involved deleting a bunch of code (one of the greatest feelings in the world, IMHO). We assigned one of our test engineers to the honorable task of evaluating the tests in our automation suite against what tests we ran manually every time we did a release. The cross-section of that comparison allowed us to find the tests that actually showed us something important about our application. The other tests were dead weight, and we were happy to cut them loose. 

Lift and shift user interface validation to unit tests

When we removed our low-value tests, we found that many were trying to validate our app’s user interface. They checked that headings had the correct wording, inputs were labeled correctly, and the like. In the interest of speed and clarity, we decided to lift and shift these tests to our unit testing suite. This is one of the more involved tasks our new approach demands, but we think it will be useful. Unit tests are easier to maintain than automated tests, and it is much more useful to test different screens in isolation, where we can pass state into the views and ensure that the view is rendered correctly according to that state. It also leaves the automated tests to the task they are best suited for — to validate complete flows instead of individual views.

Run our tests using real devices in the cloud

Running our tests on real devices was one of the most important pieces for our new approach. Without a reliable way to run our tests in a pipeline, our tests would never be as useful to us as we wanted (dare I say “needed”?) them to be. Moving our automated tests to a service that provided a surefire way to run them was a necessity.

Provide fake implementations of third-party services 

The fundamental question that led us to this decision was this: what did we want our automated tests to test? At their conception, it was decided that they should be true end-to-end tests, checking our application against our third-party services, and ensuring the integration was holistically sound. Ultimately, however, we determined this approach wasn’t working for us. It was too easy to lay blame for failures on third parties, meanwhile leaving us uncertain if our work was up to par. Removing the dependency between our tests passing and our partner’s services being available, and in the state our tests expected, meant we could be certain that our work was up to the mark. 

Conclusion

We still have a way to go, and a fair amount of work to do, before we can say we’ve achieved true enlightenment, but we think we’ve formulated a solid plan, and we’re taking steps in the right direction. Ultimately, what will get us there is focusing on a small set of high-value automated tests that run reliably in our pipelines. And while the workload demanded by some aspects of our new approach will require us to get our hands dirty to make our tests healthy again, we think the outcome will be worth it. Not only will we be able to release with a smaller manual testing lift, but we’ll have confidence that the features in a release are healthy every time we have a new build. With our newfound confidence, we hope to soon leave behind the dreaded land of despair, and look forward to seeing what we can achieve next.

What even is Staff Engineering? Part 2 in A Series

Building Apps to Last

In part 1 of this series, we discussed the broad role of a Staff Software Engineer. Today, we’re going to start looking at the specifics. What does a Staff engineer’s “everyday” look like? What are their most powerful tools? The first of the more specific points I want to write about is one I’m straight robbing from my long-time Staff+ mentor and fellow WillowTree employee, Kevin Conner.

When I first started as a Staff Engineer, you can imagine that I had many questions about what my shiny new job looked like. My first position in the new role would be to take Kevin’s place as tech lead on a 2-year-old project.* Having been on this project with Kevin for most of its life, I already had plenty of firsthand experience watching them do incredible work. To give me an even stronger start on the project, Kevin graciously met with me one-on-one several times to give me tips and pointers on how to be a good Lead Engineer. One of their first and strongest pieces of advice? “Build the app as though it’s still going to be around in 10 years.”

Now for those who don’t know, 10 years is an eternity in the tech space. I’ve actually said the same about just 3 years in the tech space. Don’t believe me? 3 years prior to me writing this sentence (so aaaallll the way back in May 2020), drive-up pickups at stores and restaurants barely existed. Hybrid/remote working was still considered a benefit enjoyed by just the fringe of society, and was likely going away forever as soon as quarantine ended. The term “AI” was mostly a joke that euphemistically meant “the best those poor, dumb computers can do.”

“But wait!” you yell, determined to disprove me (or at least determined to give me a chance to make my point). “The ‘rona sped some of those things up! This wasn’t a typical 3 years of tech progress!” That’s a fair point. But how about this one: Twitter was a respectable social media platform just 7 months ago. I’ll go ahead and mark that one down as making my point for me (thanks, Elon). The TLDR is that building an app to last 10 years is absurd.

But that was Kevin’s point. The app we were building for our client would almost certainly be long-retired by the time 10 years had passed, but it wouldn’t be because the codebase itself forced the issue. This approach was especially pertinent for the client in question because we were in the middle of rebuilding their entire web presence from scratch. So how does one take on a task as formidable as the nigh-impossible? In this case, the first and most important answer to that question is modularization. And here’s where we start getting our first look into some of the more important tools in a Staff Software Engineer’s belt.

Modularization

So, I handed Gary this article and asked it for editing advice, and it very condescendingly told me to include definitions of each of these topics before diving into my talk about them. So I made it give me those definitions.

Gary’s definition of modularization: “Modularization in the context of software engineering refers to the process of subdividing a computer program into separate sub-programs or modules. Each module represents a separate function or feature of the program, contributing to the overall functionality of the application.”

One of a Staff engineer’s primary roles will be to ensure that their codebase is properly modularized. So what does “properly” mean? Do you remember when [insert latest tech fad here] was first introduced, and we all lost our collective minds and put it into every project ever? Then 6 months later, we all blinked like we’d just woken up from a really weird dream, and went “but…why??” But, then we went ahead and did it all over again the next time a new technology appeared? Of course you remember, it just happened last month. The lesson here is: don’t do that with modularization in your codebase. Modularizing every little thing for the sake of it is worse than not doing it at all.

One good strategy is to ensure that each piece of your given tech stack can be cut out of your app as quickly and cleanly as possible, and replaced with another, comparable piece. For example – and this is especially pertinent for web devs who live and work in a world where libraries seem to breed of their own volition – you should be able to rip out your given analytics solution and replace it with another by just changing a couple of files.* A refactor of your entire codebase should never be necessary when changing out one framework or technology for another. And for those of you getting ready to give me an “Actually…,” I do mean never. Even if you’re making an enormous change that will affect a large portion of your codebase, e.g. swapping React out for Vue.js, there is no (good) reason your entire codebase should change. Things like helper functions that perform common logic, analytics solutions, and logging solutions are all things that should remain untouched during that transition. So, at the risk of making this word mean absolutely nothing by using it yet again, properly modularizing your codebase will ensure that you can remove deprecated or otherwise outdated libraries with as little effort as possible. And the day when some piece of your codebase is deprecated will come and it’ll likely be soon.*

Another excellent daily tool in the Staff Engineer’s belt is abstraction. Bear with me here because we’re about to get….abstract.

Sorry not sorry. I’m a dad, get used to it.

Abstraction

Gary’s definition of abstraction: “In software engineering, abstraction is a principle that enables handling complexity by hiding unnecessary details and exposing only the essential features of a concept or an object. This makes it possible for developers to work with complex systems by focusing on a higher level of functionality without being overwhelmed by the underlying details.”

One of the age-old adages for us Software Engineers is to “keep things DRY,” with DRY standing for “Don’t Repeat Yourself.” When you first start as a bright-eyed and bushy-tailed Junior Engineer, this adage is strictly literal: do not, under any circumstances, type the same line of code more than once. I once legitimately got annoyed that I kept having to call the same function in a bunch of different places and wondered, for just a second, if there was a way I could shorten the one-line function call. (Thankfully, that second passed quickly and I then proceeded to sit down and wonder if I was okay. I’m clearly not, but that’s beside the point here.)

As a Staff engineer, this gets a lot less literal and a lot more abstract. Are there boilerplate chunks of code or patterns that you or your engineers are having to type out by hand (or, more likely, having to copy-pasta) every time? Write a helper function that takes the work out of that. One good example of this is from the first project I tech-led (mentioned at the top of the article). We had a pattern in place for unit tests that would allow us to quickly and easily mock React’s useState functions. The problem I noticed is that we’d have to write a ternary at best, and some complicated if-then logic at worst, every time we used a state variable in a component, to make it easily mockable for unit testing. Eventually, this turned into stumbling around the codebase until we found just the right use case we needed in one of our dozens of other components, and copy-pasting the syntax into the new file.

This made parsing component files unnecessarily complex, and writing them a painfully long process. It was a perfect spot for some abstraction, so I wrote a function that handled all of the possible use cases and made it simple — via a combination of variable names, TypeScript types, and comments — to figure out how to get the use case you needed out of my helper function. At this point, state variables went from a complicated mess of logic to a function call not much more complex than calling useState directly!

Doing this kind of abstraction correctly makes your codebase easier for developers to use, easier to read, and makes it (sometimes literally) infinitely cheaper to change the logic you’re abstracting away into one single function. Without these things, your codebase will quickly get “too messy” and everyone who works on it will quickly get the “let’s rewrite this” bug, which usually damages a codebase’s ability to stick around for 10+ years.

Addressing Tech Debt

Gary’s definition of tech debt: “Technical debt, often referred to as ‘tech debt,’ is a concept in software development that reflects the implied cost of additional rework caused by choosing a quick or easy solution now instead of using a better, but more demanding, approach that would take longer. Technical debt can be intentional or unintentional.”

The last tool in a Staff’s belt that we’ll discuss here today is tech debt. Or I guess the tool would be addressing tech debt. Setting up a codebase for long-term success is hugely beneficial, but you will never set everything up perfectly on the first try. This just isn’t how the world works. Requirements change, your app underperforms in production, users get mad that feature X isn’t available, and the focus needs to shift. It happens. And when it happens, tech debt is accrued.

So let’s talk about tech debt. For starters, it’s scary. No one wants to be in debt. But how much time should you dedicate to addressing tech debt? I’ve heard people throw out numbers (”30% of your time each sprint should be used to address tech debt”), but once again, this feels entirely unrealistic to me. We live in the real world, where deadlines, money, prod issues, etc. all factor into how much time you can reasonably spend on addressing tech debt. I’ve found that the best strategy for me is playing things by ear. Product development tends to come in waves.* Use the dips in between those waves to address some of that tech debt.

The other part of this equation is that not all debt is equal. One piece of tech debt might be a couple of dollars that you put on your credit card a couple of months ago, while another might be more akin to all of your student loan debt. Those are not the same and you should not expect to tackle the first with the same ferocity or urgency as the second. Some things can’t or shouldn’t wait, and others will probably be okay sticking around with a “//TODO: Remove” above it for all eternity. That’s just software engineering in the real world.

So there’s part two for you. If I’ve done my job correctly, you now have a solid grasp on some of the more important tools you’ll need as a Staff Software Engineer. In my next post, we’ll discuss another one of those tools and continue our deep dive into a Staff Software Engineer’s day-to-day existence. Until next time, you stay classy.

Intercepting and Parallel Routes in Next.js App Router: Explained

Introduction

Next.js is a popular React.js framework that provides powerful routing capabilities. With the release of Next.js 13.4, Next.js App Router has been officially marked as stable.

The Next.js App Router is a new routing system that is built on React Server Components. The App Router provides features like nested routes and layouts, simplified data fetching, streaming, Suspense, and built-in SEO support. Overall, App Router is a great improvement over the previous routing system and makes it easier to build complex and performant applications. For a more in-depth understanding of the App Router, I encourage you to review the Next.js documentation

In this article, we will explore two App Router features: Intercepting Routes and Parallel Routes. I’ll guide you through an example to demonstrate how to effectively implement these features.

Intercepting Routes allows you to capture a route and show it in the context of another route. Parallel Routes allow you to simultaneously render two or more pages in the same view. Using these two features together can be useful for a variety of purposes, such as showing a quick preview of a page, opening a shopping cart in a side modal, or logging in a user.

Intercepting Routes

To use Next.js Intercepting Routes, you can use the (..) naming convention in your folders to match route segments. This is similar to the ../ relative path convention. Their documentation explains this very well. You can use:

  • (.) to match segments on the same level
  • (..) to match segments one level above
  • (..)(..) to match segments two levels above
  • (...) to match segments from the root app directory

Parallel Routes

Parallel Routes can be used by creating named slots in your application. Slots are defined by folders named with the @ prefix. For example, @modal in the example below. These slots will be passed as props to the layout at the same level. Note that slots are not route segments and do not affect the URL structure.

When using Parallel Routes, it is important to use default.js. This file tells the slot what to render when there is no route match.

Example

This example demonstrates how to build a login experience using the features mentioned above. When the user clicks the login link, a login modal will appear and the address bar will update to /login. When the user hits the back button, the modal will close, and clicking the forward button will reopen the modal. The user will be taken to the login page if the page is refreshed.

In this example, we will focus on the functionality of intercepting routes and parallel routes. We will not go into the specifics of setting up a Next.js project, styling your application, or dissecting the components used. For a more comprehensive understanding, I encourage you to examine the complete source code or explore the demo site.

Folder structure

├ app
│   ├── login
│   │   ├── page.js
│   ├── @modal
│   │   ├── (.)login
│   │   │   ├── page.js
│   │   ├── default.js
│   ├── layout.js
│   ├── page.js
├── components
│   ├── GlobalNav.js
│   ├── LoginForm.js
│   ├── Modal.js
├── package.json
├── README.MD
└── ... all other config files

Folder structure detail:

  • Login page (app/login/page.js)
  • Modal app slot (app/@modal) with a default layout (app/@modal/default.js)
  • Login page route interceptor (app/@modal/(.)login/page.js)
  • Base layout (app/layout.js)
  • Home page (app/page.js)
  • Components: global nav, login form, and modal.

Setting up base layout

Setting up the base layout is important because it is where we set up the modal slot output. This allows the modal parallel route to be output on the page. We grab the modal prop from the RootLayout and output it as {modal}.

app/layout.js

// app/layout.js

import { GlobalNav } from '@/components/GlobalNav';
import './globals.css';

export const metadata = {
  title: 'Next.js - Intercepting Routes',
  description: 'Intercepting Routes Login Demo',
};

export default function RootLayout({ children, modal }) {
  return (
    <html lang="en">
      <body>
        <GlobalNav />
        {children}
        {modal}
      </body>
    </html>
  );
}

Login Page

The login page is a simple app router page that renders the LoginForm component. This is the same component that will be used later in the login modal.

app/login/page.js

// app/login/page.js

import { LoginForm } from '@/components/LoginForm';

export default function Login() {
  return (
    <main className="max-w-sm mx-auto px-4">
      <h1 className="text-4xl font-bold mb-8 text-center">Login</h1>
      <LoginForm />
    </main>
  );
}

Default slot

We want the default output for the modal slot to be nothing. To achieve this, we create a default.js file that returns null. If a default.js file is not present and no route matches, a 404 page will be displayed.

app/@modal/default.js

// app/@modal/default.js

export default function Default() {
  return null;
}

Login route intercepter

Next, we will talk about the login interceptor. The interceptor starts with a single dot (.) (same level) because route interceptors are based on route segments, not folder structure. Slots folders do not affect URL structure. We could have also named the folder (…)login, as the three dots refer to the root app directory. I am not sure if one is better than the other, but I went with the single dot.

app/@modal/(.)login/page.js

// app/@modal/(.)login/page.js

import { LoginForm } from '@/components/LoginForm';
import Modal from '@/components/Modal';

export default function Login() {
  return (
    <Modal>
      <h1 className="text-4xl font-bold mb-4 text-black">Login</h1>
      <LoginForm inModal />
    </Modal>
  );
}

Conclusion

In conclusion, with Next.js 13.4 and subsequent versions, you can use the App Router and its array of great features, including two we discussed in this article: Intercepting Routes and Parallel Routes. These features, among others, are great tools that can significantly enhance your application’s navigation control and user experience.

Resources

Seamless Processing of PDF Files with Python and Node.js Applications

One of the most difficult yet vital aspects of being a software engineer, or working in technology is the requirement for never-ending constant learning in order to be an expert in your field. Every day, a new article, framework/library documentation, or research paper is published, adding to the growing pile of unread knowledge. For me, and certainly for others, having a tool that might operate as a “cliff note” creator would relieve a huge mental burden. 

Thanks to recent breakthroughs in Large Language Models like GPT-3.5/4, LLaMA, and PaLM, we can now do this. I created two scripts, the PDF Summary Generator and the PDF Uploader with Timing and Response Saving, by leveraging the strength and flexibility of Python and Node.js. They will work together as a command line interface (CLI) tool for anyone who is familiar with using a terminal to automate the process of uploading and summarizing PDF files. These programs also allow users to manage vast numbers of PDFs with varying page lengths.

First, the Python based PDF Summary Generator uses OpenAI’s GPT-3.5 language model to extract summaries, notes, and additional content from PDF documents automatically. It provides a RESTful API endpoint, accepting PDF files as input and returning a JSON object with the generated content. This application relies on the `pdfplumber` library for text extraction from PDFs and uses the `openai` library to interact with the GPT-3.5 API. Flask, a popular web framework, forms the backbone of this application, making it easily deployable and user-friendly.

Second, the Node.js based PDF Uploader with Timing and Response CLI script uploads all PDF files from a specified directory to the PDF Summary Generator’s endpoint, measures the processing time, and saves the response data as separate files within a directory titled after the original PDF’s file name. This script automates uploading, summarizing, and saving the results of multiple PDF files from the specified directory.

Before utilizing these scripts, users need to have the necessary software installed. For the Python-based PDF Summary Generator, Python 3.6 or higher is required, alongside an OpenAI API key and several Python libraries such as `flask`, `pdfplumber`, `openai`, `glob2`, and `textwrap`. The Node.js script, on the other hand, requires Node.js (version 14 or higher).

The installation process for both scripts involves cloning the respective repositories and installing the required dependencies. For the Python script:

git clone https://github.com/antoine1anthony/flask-pdf-summary-api.git

cd flask-pdf-summary-api

pip install -r requirements.txt

The PDF Summary Generator needs an additional step of creating a file named `openaiapikey.txt` in the project directory and pasting the OpenAI API key into it. Customization of GPT-3.5 prompts and chatbot responses is optional.

Using the applications is straightforward. For the PDF Summary Generator, users need to run the command `python main.py`, which starts the Flask development server, making the application accessible at http://localhost:8000/pdfsummary.

For example, to use the API endpoint, a POST request with the PDF files as multipart file attachments should be sent:

curl -X POST -F "pdfs=@example.pdf" http://localhost:8000/pdfsummary

As for the Node.js script, users should run:

npm install // Can also use yarn install
node <script_filename>


Replace `<script_filename>` with the name of the script file. The script looks for PDF files in a directory named ‘PDFs’ by default. However, users can change the directory by modifying the value of the ‘pdfDirectory’ variable in the script:

// Directory containing the PDF files

const pdfDirectory = 'my_pdfs';

Both scripts have been designed to provide clear output and error messages. The Node.js script measures the total time taken by the script. It displays a spinner while the PDF files are being uploaded and saves the response data as separate `.txt` files. The PDF Summary Generator returns a JSON object containing the generated summaries, notes, and additional content.

Contributions to the development of these scripts are always welcome. Developers can fork the repository, make changes, and submit a pull request. If you stumble upon any issues or have suggestions for improvements, please open an issue on the GitHub repository.

Both the PDF Summary Generator and the PDF Uploader with Timing and Response Saving are released under the MIT License, providing a fair amount of flexibility for further development and usage.

While these scripts aim to provide an automated and efficient solution for managing and processing PDF files, users are reminded that they use the generated content at their own risk. The quality of the output may vary, and it is advisable to review the output before using it for any critical purposes. 

In conclusion, these Python and Node.js applications offer a comprehensive solution for processing PDF files, extracting valuable information from them, and managing the responses effectively. From extracting text to generating notes and summaries, and from uploading multiple PDF files to timing the response, these scripts have got it all covered!

What is Our Culture, and What Do We Want it to be?

According to Nicole Forsgren, Jez Humble, and Gene Kim’s Accelerate, a generative culture and team experimentation predict software delivery and organizational performance. Generative culture relies less on hierarchy and power and focuses more on performance and trust. What do generative culture and team experimentation have in common? They both rely on a community of highly-skilled and engaged engineers.

Previously on “Practice Advisors – What do we do here?” we examined the following questions:

  • What sort of things are our engineering teams struggling with currently? 
  • What are our current stumbling blocks? 
  • Do our engineers have enough time to learn, grow, and be the best engineers they can be?

To answer these questions, we started in the most grassroots way possible: talking to people. We sat in high-traffic areas around the office, opened ourselves up on Slack, joined engineers’ discussion forums, and probed at things during office hours.

As we talked to people, we started noticing some trends, both good and bad, and the scenarios that engineers found themselves in stopped being unique pretty quickly. Junior engineers found our mentorship system confusing and hard to get involved with, unallocated folks didn’t feel enough sense of direction, and client limitations stifled independent growth regardless of level. Additionally, without proper cross-talk, initiatives aimed at solving these problems were spun up, but often only amplified the problem, such as our multiple mentorship programs.

What about the positive trends? We recognized that engineers who consider themselves successful at WillowTree have been on projects aligned with their personal growth, or have been able to find other outlets for that growth. They have also likely found mentors along that same alignment that have encouraged their growth both within and outside of their project work.

With this context, we now had a loose mental model of some of the different types of situations that our engineers fall into, but we wanted something more formal, something we could look at to connect and guide our initiatives with our engineers. After all, our engineers are our users. Enter Product Strategy, featuring Hank Thornhill, to help us out!

“Getting Product Strategy to help us is like a senior hopping in to fix a beginner’s issue, they might’ve spent 2 hours spinning their wheels, but the senior’s experience can help get to the solution in 5 minutes. We just have to realize we’re the beginners now.”

-Andrew Carter

With Hank’s help, we were able to devise two axes that we felt illustrated both where we are now as an engineering organization, and where we want to be. One axis is engagement, spanning from Passive to Active. The other is knowledge, ranging from Learning (someone that’s just starting to learn) and continuing up to Expert (someone with a high level of expertise). Combining these axes results in the two-by-two square you see below.

From this chart, we developed some personas to help illustrate what concrete points along these axes may look like. We recognize that these personas don’t represent every individual at WillowTree, but feel that they adequately fill that role of illustration. Some of the traits of these “individuals” are prescribed by the client, and fall on the broader organization to help shift, such as working on a hyper-embedded team. Others are more easily pushed simply by providing more obvious opportunities, such as mentorship.

Chart displaying different axes of engagement

We want to move people towards Active-Expert, represented by the upper-right quadrant of the chart. To do that, we need to engage people and improve their skills. Fortunately, these things feed into each other. Someone that is more engaged and excited, will be more easily upskilled if given the opportunity. This also creates a positive feedback loop wherein active experts are better positioned to serve as mentors, further bringing people up into that quadrant. Organizing our thoughts here has also helped us identify initiatives that can help support these conversions. For example, more opportunities for group learning can move junior engineers toward Expert as they participate, as well as senior folks towards Active engagement as they mentor.

We want to build engineers’ excitement around their craft

With this framework for our thoughts, we now feel prepared to approach the engineering groups more formally than just ad-hoc conversations. This will also allow us to remove some biases in our data that we’ve introduced by relying on people that already felt comfortable talking to us. We’re going to run a series of “retros” across all of our offices during our regularly scheduled GROW time. There we can capture what people’s goals are, what’s supporting them, what’s holding them back, and what they’re concerned about for the future. Tune in next time for the exciting results!

CI/CD is Every Engineer’s Job. Yes, Even Yours!

Picture this: the project that you’ve spent a long time working on, put in the hard hours for, and poured your soul into just releasing a new build. Sweet! But now a developer on the project pushed a brand new change and it’s your responsibility to determine if this change is good enough to be deployed to production. Time to pull that branch, open up your favorite terminal, build the application, and run some unit tests… Did they all pass? Time to run some integration tests. Did they all pass? Time to run the end-to-end tests. Oh geez, someone is bothering you on Slack too now, it’s hard to concentrate. Did those tests pass? Should I start preparing this for production deployment? Wow, this is taking a long time! Wouldn’t it be great if there was some way you could get all this done without having to do it manually? Well, I have some great news for you – you could use CI/CD for all of this!

What is CI/CD?

CI/CD (or a pipeline, as it may colloquially be known) is an automated process, or series of processes, that speed up the release of software applications to the end user. Essentially, it is automating all the manual parts of the release process that normally a developer or test engineer (TE) would do. The CI part of CI/CD stands for “Continuous Integration”, which boils down to having an automated process for code changes (often from multiple contributors). The CI will regularly build, test, and merge these changes into a shared repository like GitHub for the whole team. The CD part can stand for one of two things – “Continuous Delivery” or “Continuous Deployment”. There is a slight difference in the meaning of those two phrases. Continuous Delivery refers to a process on top of CI that will automatically have your code changes ready to deploy to an environment such as Test, Dev, UAT, etc. This means that in theory, you could decide to release whenever’s best for the project schedule. Continuous Deployment takes this process one step further by having automated releases to production. In Continuous Deployment, only a failed test or check of some kind will stop the code changes from being deployed to production once the code change is approved & merged.

How can CI/CD benefit a software project, both for Development and QA?

Everyone can benefit from CI/CD, and that’s one of the main reasons that it’s good for everyone to have some experience with it.

For developers, having CI/CD set up can improve the quality of code and the speed at which code reaches product owners and end users. With the build and deployment processes automated, the most recent changes are automatically available to other members of the project team to look at, reducing the time it takes for issues to be found. This means that TEs and product owners can take a look at the deployed version of the codebase sooner than if a developer had to do this manually since the pipeline will run each step automagically and not have to have a person sit there and watch each step until it is done. This also frees up the developer’s time to work on other tasks while the pipeline builds and deploys the changes. In addition, this can reduce the impact of code/merge conflicts since the pipeline should expose any of these issues before allowing code to reach production. Unit test failures and integration test failures in the pipeline will alert the developer early on that there are changes that need to be made to their code before deploying to production.

For the TE team, running automated tests as part of the pipeline reduces the time and effort needed to test the most recent code changes. This free time lets TEs focus on other testing scenarios such as exploratory, performance, and regression testing. It also means that the TE team will have time to write new automated tests that can be added to the pipeline. Additionally, running changes regularly through the pipeline means that TE can isolate failures to certain change sets, making it easier to diagnose and fix issues found with end-to-end automated testing.

There are also benefits to having a CI/CD pipeline that applies to everyone on the project, not just development or QA. One distinct advantage of CI/CD doing all this work for you is that it reduces human error that could otherwise be introduced in the process. At any time during the process of building the app, kicking off tests, and deploying to another environment, a person could make an error – copy/paste, typing the wrong thing, ignoring an error message – that a machine would not. A human can do things differently every time, whereas a computer will do the same thing every time. Another benefit that I touched on earlier is freeing up time that would otherwise be used to do the steps in the CI/CD pipeline manually. Developers would spend more time deploying and running unit tests, and TE would spend more time running end-to-end tests if the pipeline wasn’t there. On top of this, unless they are a miracle worker of some sort, there’s latency in between each step if a person does it, whereas a computer moves between the steps in the process instantaneously. The last benefit I’ll mention is that CI/CD can bring some good collaboration between development, QA, and other members of the project team. TEs and developers get to work together to understand each others’ processes more, which helps make sure that the pipeline is doing everything that is needed. Other members of the project leadership team may also have requirements for the pipeline that they can collaborate with the engineering team on.

How can I get started?

There are several vendors for CI/CD that you can check out, read the documentation for, and start writing. GitHub Actions and Azure DevOps are vendors that work with CI/CD and have good documentation that you can read up on before you start writing. I’ll even give you some sample steps for GitHub Actions. 

Once you’re ready to get hands-on with it, you can go to your project in your IDE, create a .github/workflows directory, and add a pipelines.yml (or whatever name you choose) file to get started with GitHub Actions. You can create several .yml files in the directory, named whatever you want, to run the various actions that you want to do. I recommend using a naming convention that makes sense i.e. nightly.yml [for running a nightly build], pr.yml [for running on opened Pull Requests], deploy.yml [for running on deploys], etc. There’s plenty of configuration that goes into your .yml file, but once you have it set up, test it to make sure that it works. The action you take to trigger said pipeline varies by how you set it up (I recommend making it on push for testing so it runs whenever you push code). Once you have it running, you can visit the GitHub page for your project and check the Actions tab to see the runs of your pipeline. (Or you can just follow the Getting Started guide from GitHub themselves 😉).

Now it’s your turn to use what you’ve learned!

Having a solid CI/CD pipeline setup is crucial to delivering a well-made, on-time project with as few defects as possible. Both QA and Development benefit from having a good pipeline on a project, so both sides need to learn about CI/CD and work together to keep the pipeline in tip-top shape. Using CI/CD allows engineers to work on other/more important tasks, ensures human error is kept to a minimum, and gets the code changes out to the stakeholders quicker. Every project should have a CI/CD setup!

Bonus

I asked ChatGPT to write me a few sentences on the importance of CI/CD in a project setting with multiple contributors in the style of a pirate and it had this to say…

“Ahoy matey! CI/CD be the wind in yer sails and the rum in yer cup when it comes to a successful project! Without it, yer ship will be dead in the water, floundering like a fish out of water. With CI/CD, yer code be tested, built, and deployed faster than ye can say “shiver me timbers!” So hoist the Jolly Roger and set a course for smooth sailing with CI/CD!”