The engineering community here at WillowTree, especially within the iOS practice, has been super excited about SwiftUI. It’s incredible how much power and flexibility it brings to our app development workflows. This blog post was created in conjunction with Will Ellis, who gave a wonderful talk showcasing the power of SwiftUI Button Styles. You can find a copy of the code here.
Here at WillowTree, developers often work hand in hand with the design team before implementing anything in an app. For example, when we create a component (buttons, icons, sliders, etc.), designers provide a design system that illustrates how they should look and be reused across an app. In addition, these designs can show how the components change based on their state. In this blog post, we’ll be looking to implement buttons based on the designs below. Luckily, SwiftUI has great features that allow us to support different states for our buttons and make it easy to bring our designs to reality.
Option 1: Applying View Modifiers to Style a Button
SwiftUI already has a built-in Button component. Slap some view modifiers on that for styling, and boom, a button!
With just a few view modifiers, it is easy to get started. However, as it is currently written, the styling is not encapsulated or reusable.
// Applying view modifiers to style a button // Pros: // ✅ Easy to get started // // Cons: // ❌ Styling is not encapsulated or reusable, not DRY
Option 2: Wrapping Button with our own View
Another approach we can try is to create a custom button. To do this, we can give the custom button all the necessary view modifiers, and use initializers to pass in the action and label parameters.
This approach gets the job done, but it doesn’t allow us to access the button’s state and requires too much boilerplate code. We can do better than this.
// Wrapping Button with our own view // Pros: // ✅ Encapsulated, reusable, and DRY styling // // Cons: // ❌ Tight coupling of styling and content // ❌ No access to button pressed state // ❌ Have to duplicate Button boilerplate (e.g., // initializers, generic Label type)
Option 3: Using Built-in ButtonStyles
What if we wanted to use the built-in Button but also wanted to use Apple’s styling? This is where the .buttonStyle view modifier is useful. It has a variety of styling options from Apple that are pretty neat. It also gives us the option to set a role like .destructive that changes how the button functions.
The main downside is that it uses Apple’s styling designs, rather than our own designs, which takes away from the customization of an element.
// Apple provides us with built-in ButtonStyles // Pros: // ✅ Work cross-platform // ✅ Easy to use right out of the box // // Cons: // ❌ Not what’s in our design
Option 4: Creating a Custom ButtonStyle
Apple has also given us the option of creating our own button style type. ButtonStyle is just a protocol, and all we need to do to conform our own type to it is to implement the protocol’s makeBody function. This is where we would also adjust all the view modifiers for the styling we want. The really neat thing about this option is that we now have access to the .isPressed functionality which allows us to modify how the button reacts to being pressed. For example, we can change the background color while the button is pressed.
// Pros: // ✅ Encapsulated, reusable, and DRY styling // ✅ Access to button pressed state // ✅ Styling and content are decoupled // ✅ Can use existing button initializers // // Cons: // ❌ Easy. Almost too easy 🤨
We can then go further with a customized ButtonStyle by setting a color scheme based on our designs, using the environment value .isEnabled, creating a State variable for isHovered, etc. All these options allow us to manipulate how the button looks and feels within the app while maintaining the design we want, in an encapsulated, reusable way.
This option also makes it easy to set the desired color scheme just by extending ButtonStyle and initializing the different color scheme combinations. We can even extend our ButtonStyle to add leading and trailing icons using generics. Be sure to check out CapsuleButtonStyle in the repo to see how these variations are implemented.
Conclusion
Who knew buttons could be so fun? SwiftUI provides multiple options for customizing and styling buttons to match any design system. By exploring the different approaches, we gain a deeper understanding of the pros and cons of each option, so that we can make informed decisions about which approach is best for our particular use case. The iOS community here at WillowTree has enjoyed exploring all the cool things SwiftUI has to offer. Thanks for reading!
Will is a programmer who has been making iOS, web, embedded, and desktop software for many years. He loves to combine his skills in software engineering and his enthusiasm for HCI to make compelling, user-focused software.
Making technology understandable for any audience is why I first got into software testing and UX design. After having spent three decades learning how to explain technical things to people across a wide range of demographics, I’ve noticed this is an underdeveloped skill in our industry. It should be taught and emphasized so much more because when it is practiced and honed, it takes all of your other skills to the next level.
If you own a smartphone, reader, tablet, gaming console, desktop, flat screen, streaming service, home assistant, smart appliance, or garage door opener, you have no choice but to be able to converse about technology without getting frustrated. To be able to make some specific analogies and tips in this article, I’ll narrow the technical context down to roles and tasks most commonly found in a software development company. This would include everyone from project manager and designers to developers and testers.
That said, even with a warm audience of people and topics within your own company or industry, technical explanations are tough.
There are a lot of technical conversations in the day-to-day of software development, and most of them fall under one of these scenarios:
Peer-to-peer Knowledge transfer
Client demos
Executive presentations
Impromptu situations
No matter the situation you will use two tools: Willingness and Preparation.
Willingness
You may lack the self-confidence to explain technology, but the cool thing is, explaining technical things to anyone, friend, family, tech support, gradually improves your confidence. All you have to do is be willing to try and accept constructive feedback.
Volunteer for low-pressure opportunities to practice your preparation and explanation skills. When I’ve lacked confidence for an upcoming presentation I’ve run it by a peer or mentor to get feedback. It’s not only encouraging but the feedback can root out any holes or confusion before you go live.
Whether it’s a one-to-one conversation, a presentation to execs, or a Zoom meeting, remember, you have been asked or accepted to present on something you know, not for the things you don’t know (unless it’s a presentation on “Things We Don’t Know”).
By volunteering for low pressure opportunities, you’ll be able to hone your abilities in low stakes areas, and pretty soon you might even find your confidence has grown such that those areas that used to feel overwhelming might be just a bit easier to jump into.
Preparation
Even for impromptu situations, you can do preparation because, in actuality, you are preparing with every interaction you have at work because the core of preparation is communication.
Communication
“Explaining” and “presenting” are forms of communication. What I’m talking about is honing the art of communication. For example, if you have a potluck, there will probably be multiple forms of mac and cheese and they are all accepted on their own merit. But, if you have a mac and cheese cookoff, you focus on the nuances that elevate one mac and cheese over the other; you hone the art of mac and cheese.
One aspect of communication you can practice in almost every situation is clarifying terminology. Across roles, platforms, and demographics, the meaning of technical terms varies greatly but we all assume everyone is talking about the same thing.
Actively listen in conversations for words that are used interchangeably. If appropriate, ask one of the contributors what the term they used means to them. Many times I’ve done this and it reveals we are not all talking about the same thing or there are deeper nuances we’re missing.
When you agree on what you’re all talking about, then openly agree on which term all of you will use in the future. Then, be consistent. Make sure you use that term in documents, emails, and instant messaging. Even with clients and co-workers outside your project.
On the FOX Weather project, we had some development requirements around what happens when the app is closed or opened. We determined these two terms weren’t enough to explain all the scenarios.
We expanded “closed” to when a user puts an app “in the background” but it is still running, or, when the user “force quits” an app so that it is no longer running. One action might keep the user logged into their account where the other would log them out.
When a user taps an app’s icon on the home screen it “opens” but from two different states. If the app is “running in the background” the user is simply bringing the app back into the foreground and, potentially, continuing in the app where they left off. If the app was “force quit” the user is relaunching the app which might mean they have to login again.
Having the team agree on what these terms actually meant helped the project as a whole function better. All thanks to our shared communication.
Awareness
Saying, “just be willing to try” is a little trite, but the meaning behind it is that you are in situations where you have to explain tech all the time, you just need to become more aware of them. Be aware of the terms the listener is using; the points where comprehension sinks in and where it doesn’t. Then use that information to improve and take on more situations for practice.
Rhonda Oglesby has been with Willowtree since December 2019. Her career path as a software developer and designer led her to the perfect career: Senior Software Test Engineer. She’s a well-rounded geek who believes communication and conviction are the keys to success.
Imagine for a moment that you’ve been working tirelessly for months on a shiny new product with killer features. This one product alone will drive a 20% revenue bump in the year following launch. You’ve written maintainable code, great unit tests, and have an automated deploy pipeline that makes rapid iteration a joy. The product launches. People are installing the app. Active user count is climbing fast. 1-star reviews are flooding in…wait, what? “I can’t log in,” “all I see are 504 timeouts,” “trash doesn’t work,” ….oh dear. You didn’t do your capacity planning.
Disambiguation of Terms
The term “performance testing” tends to be used as a sort of catch-all; but really, it should be broken down into a few distinct categories. You can name the categories whatever you want, but for this article we’ll be using the definitions below:
Performance testing
You are measuring ‘how does this thing perform, under various specific circumstances.’ Can be thought of as “metric sampling.”
Stress testing
You are measuring ‘how does this thing respond to more load than it can handle,’ and ‘how does it recover.’ Can be thought of as “resiliency testing” or “failure testing.”
Load testing
You are measuring ‘given a specified amount of load, is the response acceptable?’ Can be thought of as a “unit test.”
Given the above definitions, you may note that a “stress test” is in fact a “performance test,” simply executed at unreasonable load levels. Similarly, a “load test” can be a composition of the other two, with the additional step of asserting a pass or fail.
Prerequisites
What do you need in order to perform these kinds of tests? The answer will of course vary (significantly) from case to case. We can, however, make generalizations about what is needed for the vast majority, especially given our end goal of capacity planning.
Production-Equivalent Infrastructure
The first step is to ensure that the environment you are performance testing against is provisioned the same as production. Same instance class, same storage amounts, same everything. This also includes any dependencies of your application, such as databases or other services. This is the most important prerequisite. Testing on your laptop or your under-provisioned QA environment may give you directional information, however it IS NOT acceptable nor sufficient to make capacity planning decisions.
Testing Tools
It is usually a good idea to use tools which make it simple and easy to record and version control your test cases. For instance, the tool Locust uses test cases which are simply Python classes; this is perfect for code review, and allows for rapid and simple customization of the tests (ex. Randomize a string without reuse). In contrast, JMeter uses an XML format which is significantly more difficult to code review, and generally must be edited from within its own GUI tool. Whatever tool you choose should be quick/easy to set up and provision. We want our effort to be on the testing, not fighting the tools.
Distributed Tracing
Distributed tracing allows you to ‘follow’ an event throughout your system, even across different services, and view metrics about each step along the way. Distributed tracing is strongly recommended, as it will enable you to simply review the data after your test, and rapidly zero in on problem areas without needing additional diagnostic steps. It may even help you uncover defects that you otherwise would not have noticed such as repeated/accidental calls to dependencies.
Achievable, Concrete Goal
As a group, you must come up with a goal for the system to adhere to. This goal should be expressed as a number of requests per time unit. You may additionally wish to include an acceptable failure rate as well; for example 5000 requests per second, <=2% failure rate. The business MUST be able to articulate what success looks like. That 20% revenue projection? That came from something tangible. Active users performing specific tasks over time. Non-technical stakeholders may balk at being asked to provide concrete non-revenue numbers. It’s important to partner with these individuals and help them understand where the revenue goal comes from, and how these concrete numbers impact the likelihood of success.
Revenue is effectively the ANSWER to a complex math problem with multiple input variables; we must reverse-engineer the equation to solve for one of those inputs. If that 20% revenue projection is based on nothing but hopes and dreams then just do something reasonable. More on this in the Partner section below.
Methodology
So, how do we go about doing all of this? Instrument, Provision, Explore, Extrapolate, Partner, Assess, Correct. IPEEPAC! This acronym is my contribution to our industry’s collection of awful buzzwords. You are quite welcome.
Instrument
We can begin with instrumenting the application. The specifics of how to do this with any given framework/vendor could be an entire article by itself, so for our purposes we’ll stick with some fairly agnostic terminology and general approaches. You’ll need to identify a vendor/product which supports your tech stack and follow their instructions for actually getting it online. They will likely use the terms below, or something similar.
Span
An ‘individual operation’ to be measured
Spans can, well, span multiple physical operations; for example a span could be a function call, which actually makes several API calls
Can be automatically generated or user-defined
May have many attributes, but typically we care about the elapsed time
Can contain other spans
Trace
An end-to-end collection of spans
The complete picture of everything that went on from the moment a request enters the application until the response is transmitted and the trace is closed
Here’s an example visualization of a trace and it’s associated spans, taken from the Jaeger documentation:
It’s evident why this is useful. We see every operation performed for the given request, how long it took, and even if things are happening in series or in parallel. We could even see what queries are being executed against the database, as long as the tooling supports it.
Most instrumentation tools will provide you with some form of auto-instrumentation out of the box. Whether it’s an agent which hijacks the JVM, or a regular SDK that you bootstrap into your application at startup. Many times, this will be sufficient for our purposes but always verify that this is the case. At a minimum, we need to ensure that we are getting the overall trace and spans for every individual dependency; be it http, sql, smtp, or what have you. This is enough to give us a rough idea of where problems lie. Mostly, it will help us to identify if the source of slowdown is our application code, some dependency call, or a combination thereof.
Ideally we would want some finer grained detail. It’s typically good to add spans, or enrich existing ones with contextual information. For instance, you have a nested loop making api calls – it would be good to have a span encompassing just this so that you can easily see how much execution time it takes up without needing to sum up the dependencies. You could add metadata to that span about the request – perhaps certain parameters are yielding a much larger list to iterate over.
Provision
This one is straightforward; creation of your compute and/or other resources – be they cloud or on-prem. Provision your production analog, and your test tooling. Verify that the tooling works as expected and is able to reach the application under test.
Importantly, DISABLE any WAF, bot detection, rate limiting, etc. These are vital for an actual production environment, but will make our testing difficult or impossible. Remember what we are testing here – it’s our application and architecture, not your cloud vendor’s fancy AI intrusion detection.
Explore
This is where the party starts. For right now, disable autoscaling and lock yourself to a single instance of the application. We’ll get back to this, but in the meantime, let’s start crafting some test cases. This is a lot of art mixed into the science unfortunately…We want to be as thorough as possible. Ideally, we would like to exercise every operation the application may perform, however, this isn’t always feasible. There may be too many permutations, or some operations may require proper sequencing in order to execute at all…so what to do?
Start by separating the application into its different resources, and their possible operations. Be sure to include prerequisites such as user profiles. For example, you can fetch or update a single user’s profile or search for multiple profiles…but you can’t really do any of that until a user has been registered, because the registration process creates the profile.
So:
Register user <randomized guid + searchable tag>
Fetch profile <id>
Search profiles matching <tag>
Update profile <id>
We could have a test that does exactly that sequence of events. While this does exercise the code, it’s somewhat conflating issues and also misses some points.
This exercises a large number of users registering/viewing/updating their profiles at once
The longer the test runs, the larger the list of profiles that will be found/returned for the tag search
Let’s try again:
Register user
Fetch profile
Update profile
Search profiles
fetch a subset of these individually
This more closely resembles what actual traffic would look like. People generally register, go to their profile, then update it. Other users generally will search for stuff, and then view it. Taken together, this gives us a better picture of how our user experience will progress as we add more users to the system.
However, we’re still missing something. While we are exercising both read and update, we are doing so more or less sequentially. So we might want to add an additional, separate test:
Register n users
In parallel
Fetch individual profiles
Update individual profiles
Search by tag
Now, taken at face value this may not seem like a relevant test. Users do not generally update their profiles in a massively parallel manner. However, what this test does tell us is how our application responds to multiple, potentially conflicting, potentially LOCKING operations happening simultaneously.
If we apply this thinking to all the operations the application may perform, we could end up with an overwhelming number of test cases, but we can pare things down. We should identify the so-called ‘golden path’ of the application; the sequence of operations that most people perform most of the time. This should be a relatively small slice of functionality, spread across a few areas. We can be very thorough about exercising this functionality. Then, we use our knowledge of the application to identify other areas which we expect may be problematic, or areas that we simply don’t have any coverage on at all. This is where the art vs science really comes into play.
Once we have the test cases, we can start executing them. It is generally advisable to start small and increase relatively rapidly until you start seeing issues. A starting point might be to start at 10 operations/second and keep doubling until you see excessive slowdown or errors. Once you see this, dial it back to the last checkpoint and make small adjustments until you find the breaking point, or single-instance capacity. Note this down, it’s important. Also note down CPU and memory consumption at this point, as well as average and max response times for any dependency calls (you did implement tracing, right?).
A common question to have at this point is “what counts as excessive slowdown/errors.” Unfortunately, it’s really case-by-case. You could assume as a starting point that your 10 operations/second performance is ‘acceptable,’ and then once your response time or error rate doubles, then it is no longer acceptable. You could also assume that it’s fine until the ratio of failed requests to successful ones is greater than 50%. At this stage in the process, any of this is fine; we are exploring, after all. We’ll return to the topic later.
Extrapolate
Now that we’ve done some exploration, we can make some educated extrapolations. We have our single-instance capacity. We know that a high-availability application generally wants MINIMUM 3 instances, spread across availability zones…So we can extrapolate that this setup should handle 3x our single-instance capacity. So go and set that up. Re-run the exploration and record the results. Do they align with our extrapolation? If yes, great success. If not, we need to understand why and potentially make changes so that they DO align.
This is also a good time to do a stress test. Push those 3 instances until they become unresponsive, then back the traffic down to ‘reasonable’ levels and see how long they take to recover. Again, note it down.
It would be wise to take a look at the tracing data here. It will likely help you identify the reason for any discrepancy between our prediction and the actual results. Ex. more traffic being sent to particular instances, or increased latency on database queries because the database is in a different availability zone than two of the instances.
Partner
Now that we have some baseline data, we need to figure out what to do with it. The business has revenue goals, and our product owner almost certainly knows what an acceptable user experience “feels” like even if they can’t (yet) articulate numbers. This is a good time to get the product owner (or other relevant stakeholders) involved in the process.
Start by sharing the baseline, 10 requests/second numbers. Show how the application responds here; people will usually have opinions of if this “feels okay” or not.
If not, we have a major problem
Next we need to determine how slow is too slow, and/or how many errors is too many
We can start by sharing our exploration assumptions – everyone may be fine with it
This can be challenging – if possible it would be good to mock up the application so you can configure a delay
Can also relate to other activities – “it loads before you finish reaching for the coffee, or “you can take a long swig and put the cup down.”
Once we know what is acceptable vs not, we can configure autoscaling accordingly
It needs to kick in well BEFORE we hit the “unacceptable” mark
Remember that it can take tens of seconds to a few minutes for new instances to come online, depending on your hosting choices
A fair starting point is to set your scaling to about 65% of “unacceptable”; you can tweak this higher or lower as needed
Now the truly difficult part begins – working with the stakeholders to determine how many requests per second do we need to be able to handle. Quite frequently the first answer is “all of them”. We know this is not possible; the cloud is not magic, and even if it was, the price tag would far exceed the revenue goal.
We can start with what we do know. Our “golden path” defines certain operations, and we know how many requests those need, and how long they take. We can get a ballpark requests/second for a single user based on this count, over some amount of time.
Next we need to know how many active users to expect. This is going to be entirely case-by-case. We may be able to do relatively simple math in the vein of “x transactions at y, average user makes z transactions in a week”…or perhaps we have traffic levels for an existing product, and the business projects “n” times that much traffic. Reality is seldom so simple.
Most likely we’ll need to really get deep into the business case for the product, how the revenue goal is determined, and derive some way to associate this to a number of simultaneous users. It’s very important to do this in collaboration with the stakeholders so that everyone shares ownership of this estimate. The goal is not to point fingers at so-and-so making bad estimates, but to make a good estimate and to learn from any mistakes together. Going through this exercise may even help the business to make better, more informed goals in the future.
Given an average user’s requests/second and our estimated active user count, the simplest solution is to multiply them. In theory this should give us our projected “sustained load” (depending on the application, we may also need to identify the acute load, or the load when experiencing peak traffic. For instance, we expect a restaurant’s traffic to hugely spike for the lunchtime hour). Experience has shown that it is often prudent to double (or more than double) this number for launch, and then hopefully reign it in after the initial surge has died down. Once you obtain real-world average traffic you can set your minimum instance count to support that, and then set maximum to the greater of 2x the new minimum or 2x our original projection.
It is worth noting that the projected numbers may be very/too expensive to operate. Options are limited here:
Attempt optimization of the application
May or may not be possible/make a noticeable difference
Accept the cost
Don’t launch
Re-estimate – for instance, consider that not all time zones will be active at the same time
This is a huge risk, and should be avoided if at all possible. Any scrum team will tell you that “you need to size this smaller” rarely ends well
Phased rollout – launch to one geography or subset of users, make adjustments and projections based on the traffic seen there
Assess
This is an easy one…Execute tests at our expected load, and work it all the way up to maximum capacity. We want to see the autoscaling trigger and maintain acceptable performance. Document the results, and once again poke through the tracing looking for any anomalous behavior. We want to be sure to identify and document any potential issues at this point. For instance, a 3rd party dependency rate-limiting us before we hit maximum capacity, or perhaps our database doesn’t have enough connection slots.
Correct
Based on the assessment, correct the issues and re-assess. If there’s anything that cannot be resolved, document that issue, and have the stakeholders sign off on it before launch.
Infrastructural Notes
It is important to recognize that your choice of hosting has a huge impact on both how scalable your application is, and how you must configure that scalability. For instance, a basic deployment into an autoscaling group may allow you to scale based on CPU consumption. An advanced Kubernetes deployment with custom metrics may allow you to scale based on the rate of inbound HTTP requests. Deploying into some cloud PaaS may be anywhere in between, depending on the vendor. Why does this matter?
Capacity is more than CPU+RAM
It cannot be assumed that high load ALWAYS corresponds to significant processor or memory utilization. It is a reasonable assumption for perhaps the majority of use cases, however it is certainly not a universal truth. If an application is primarily I/O bound, then it is possible to completely overwhelm it without significant processor or memory spikes. Consider the increasingly common middle-tier API. It simply makes calls to other APIs, and amalgamates them into a single response for the consumer. Most of its time is actually spent idle, waiting for responses from other applications. If its dependencies are slow, we could consume the entire threadpool with requests, and return 503/504 status codes without seeing a CPU spike. Ironically, if many of those responses were to complete all at once we might see a massive CPU spike, which could render our service entirely unresponsive, but also would be so brief as to not trigger a CPU scaling rule which typically requires a sustained load over a period of minutes.
Bandwidth must also be considered. Our CPU and RAM may fit nicely within the most basic instance class, but if the virtual NIC is insufficient for the amount of data we are dealing with, we will find ourselves bottlenecked and once again not scaling. Ditto for disk IOPS. The long and short of it is we must look holistically at the application’s workload, and EVERYTHING it needs to accomplish that workload when capacity planning.
A caution about serverless
There is nothing inherently wrong with serverless/lambda/etc as a hosting choice. There is, however, a fair bit wrong with assuming that it is always a GOOD solution to your scalability problems. Like any and all other tools, we need to understand how it works, what it is good at doing, and what problems it may actually introduce into the system. Serverless is perhaps one of the most frequently misunderstood and/or misused hosting options, especially when it comes to REST APIs.
Serverless typically operates by creating a new instance of your application for every inbound request; assuming all current instances are processing their own requests. This gives the application massive horizontal scalability, and parallel processing capacity, but has some limitations.
Cold starts
Literally booting a new instance of your application for every request introduces latency; for instance, java/spring applications may take multiple seconds to start up
Many providers offer a service tier to reduce cold starts – at additional cost
Massive is not infinite
There IS an upper bound to the number of instances; AWS for example by default limits you to 1000 concurrent lambda executions per Region
So yes, your dev environment may be consuming execution ‘slots’ preventing production from operating at capacity
Larger serverless instances tend to be costly, and they may accrue cost both per invocation and for execution time
It can be difficult to get your application into a serverless platform
Vendor depending; there are often size limits for your executable or other restrictions which need to be worked around
The parallel nature of serverless can also introduce new problems.
Your database’s connection slots can rapidly be exhausted because every serverless instance will consume at least one slot; connection pooling code will only apply within each instance
Internal caching of dependency requests will only happen within each instance; increasing the number of calls to your dependencies
Backoff/retry logic will also only happen within each instance
The thing to note here is that simply hosting on serverless does not necessarily increase your capacity to handle inbound requests. In fact, the parallel nature of serverless may reduce capacity overall. To reap the benefits, not only must your application be designed with serverless in mind, but also your dependencies must be able to handle it as well.
Wrapping Up
Application performance and scalability is a huge and broad topic, and this article is really just hitting some of the highlights. While we’ve outlined a sort of methodology and a whole lot of steps and things to watch out for, you don’t have to do everything at once. Start small – if you can only do one thing, add the instrumentation.You might be surprised how much you learn. If you can do two things, instrument and partner with the stakeholders to work out what the goal should be.
WillowTree is an organization that has absolutely exploded with growth over the last 7 years, since I started here as an intern in 2016. At the time, we were at about 150 people in the entire company, and now we’re eclipsing 1,300. With this growth has obviously come change. Information is harder to disseminate, and it’s harder to be on the same page with your fellow engineers when you’re much more spread out. This sort of spread means we end up solving the same problem multiple times, and sometimes incorrectly. My colleague, Andrew Carter, said it best, “building an app is your game to lose.” We have all the tools and knowledge at our disposal with the great minds here within WT, so how do we make sure we’re making the most of them?
Building an app is your game to lose.
You hear it all the time in the industry; a company got big and lost its culture. But why? Is it that culture intrinsically cannot scale? Is it that hiring needs to become less specialized to fill a larger number of seats? I think the answer depends on the organization.
WillowTree’s engineering culture has always heavily emphasized the importance of growing our own engineers’ abilities. A focus on mentorship, combined with truly great people make this formula work very well, and you can see it in WillowTree’s success. Pre-2020, this culture was heavily dependent on colocation. When the Covid-19 pandemic pushed all of us to work from home, that was a big shift, and our special sauce needed to be adjusted a little.
We saw a lot of changes happening with this shift to working from home. A lot of the crosstalk that would naturally happen across project teams simply stopped because we were no longer sitting together. Folks that were hired after this point (see: the majority of the company) were coming in with no opportunity to share a physical space with coworkers. It became much easier to lose your identity as a WillowTree employee, and be pulled solely into your project work. In the short term this may seem great, no pesky distractions from delivering quality products on time, but in the long term this has negative implications for things like mentorship, individual growth, and in turn, project outcomes. These implications are then compounded further when you consider the massive growth we’d taken on in that time. With that growth, it become more important to maintain our ability to raise people up internally and make them the best versions of themselves.
This is where Andrew Carter and myself come in. Some very smart people before us identified that there was a need for a more centralized role; one that wouldn’t be billed to a client. A role that would exist as a conduit for folks to more easily share information across project teams, and in turn, across the org. That role has been dubbed “Practice Advisor,” with Andrew and myself being the first two to take it on, specializing in iOS and Android, respectively.
“So what the heck does a Practice Advisor do exactly?” Great question; we’ve spent the last few weeks working on that, and to be honest, we’re still figuring it out 😬. But really, our first steps are focused primarily on setting a baseline for ourselves and for the organization. What sort of things are our engineering teams struggling with currently? What are our current stumbling blocks? Are there mistakes we see getting made repeatedly across the org, or are we learning from our own teams? Do engineers have enough time to learn, grow, and be the best engineers they can be?
By maintaining our technical expertise, we can stay close to the engineers doing the work, but serve as liaisons to impact broader org changes.
We’ve been spending a lot of time tracking and advocating for people’s involvement in some existing WillowTree shenanigans. For example, we’ve been looking at GROW, which is a space for folks to come present something they think is cool, something their team had a tough time with, or just something they’d like people to know. This has been flourishing recently with an influx of folks coming from Andrew and I’s consistent berating informing. Our backlog of content now extends two months out, and attendance has never been higher with our most recent GROW having over 40% of each practice in attendance. Besides GROW, we’ve also started holding Office Hours. These are meant to be a space for anyone to come and either ask questions about mobile, or have larger group discussions around potentially-controversial topics in the space (viewmodels with SwiftUI anyone?).
Looking ahead we’re going to be meeting more closely with various levels of the organization to further spread awareness of the role while also garnering more information about people’s needs or wants from the role. Andrew and I are still in client services, it’s just that now the client is WillowTree itself! We’ll be posting additional journal entries like this as we continue to flesh out and evolve this role, so check back to stay informed about this role and the other engineering happenings here at WillowTree!
The best iOS developer in Staunton Virginia, and third best banjo player. Making apps @ WillowTree since 2011, he’s now focused on building our iOS practice.
Swift Snapshot Testing is a wonderful open source library that provides support for snapshot tests in your swift code. Its snapshot comparison options are robust, supporting UIImages, JSON, ViewControllers, and more. Originally developed with unit tests in mind, this tool is easily extended to support comparisons of `XCUIElement`s in UI tests as well. This allows for quick creation of automated tests that perform more robust visual validation than XCUI can provide out of the box.
Installing Swift Snapshot Testing
Installing involves two steps – importing the package and updating the dependency manager in your project to use the specified version.
Importing the Package (Xcode 13)
Go to File -> Add Packages
Click the plus and copy the Package Repository URL from the Snapshot Testing Installation Guide
Load the package
Select Dependency Rule and add to the project where the UI test targets live.
If the tests live in the same scheme as the application code, you’ll have to update the project settings as well. Even if they live in a different scheme, follow the steps below to ensure that the package is associated with the correct target.
Set the Package to the Correct Target
Go to Project Settings
Select Project and ensure that the SnapshotTesting package is displayed
Select the Application Target and ensure that the SnapshotTesting package is not displayed under Link Binary with Libraries
Example: You don’t want the package to be associated with the application target, in this case SnapshotTestingExample
Select the UI Test target and ensure the package is displayed under Link Binary with Libraries
Example: You do want the package associated with the UI test target in Project Settings, in this case SnapshotTestingExampleUITests
Using Swift Snapshot Testing
Working with snapshot testing is as simple as writing a typical test, storing an image, then asserting on that image. You’ll have to run the test twice – the first “fails” but records the reference snapshot, then the second actually validates the snapshot taken during the run matches that reference.
Writing a Snapshot Test
In the case of UI testing, without any custom extensions, you can assert on UIImages either on the screen as a whole, or on a specific element.
Snapshot the whole screen
// Using UIImage for comparing screenshots of the simulator screen view.
let app = XCUIApplication()
// Whole screen as displayed on the simulator
let snapshotScreen = app.windows.firstMatch.screenshot().image
assertSnapshot(matching: snapshotScreen, as: .image())
Tip – Set the simulator to a default time when using whole screen snapshots
A full screen snapshot includes the clock time, which can understandably cause issues. There is a simple script you can add to the pre-run scripts for tests in the project scheme that will force a simulator to a set time to work around this.
To Set a Default Time
Select the Scheme where the UI tests live
Go to Product -> Scheme -> Edit Scheme
Expand Test, and select Pre-Actions
Hit the + at the bottom to add a new script and copy the script below
Simulators set to specific time when running tests
Snapshot a specific element
// Using UIImage for comparing screenshots of XCUI Elements
// Specific element on the simulator screen
let snapshotElement = app.staticTexts[“article-blurb”].screenshot().image
assertSnapshot(matching: snapshotElement, as: .image(precision: 0.98, scale: nil))
If you instead utilize the custom extension that provides support for XCUIElements directly, as found in this pull request on the repo, the code is simplified a bit and removes the need to create a screenshot manually, as seen below.
// Using extension to support direct XCUIElement Snapshot comparison
let app = XCUIApplication()
// Whole screen as displayed on the simulator
let snapshotScreen = app.windows.firstMatch
assertSnapshot(matching: snapshotScreen, as: .image())
// Specific element on the simulator screen
let snapshotElement = app.staticTexts[“article-blurb”]
assertSnapshot(matching: snapshotElement, as: .image(precision: 0.98, scale: nil))
Precision and Tolerance
One of the features I appreciate in this framework is the ability to set a precision or tolerance for the snapshot tests. As seen in some of the above examples, the precision is passed in on the `assertSnapshot()` call.
// Precision example
assertSnapshot(matching: snapshotElement, as:. image(precision: 0.98, scale: nil))
Precision is an optional value that can be set between 0-1. It defaults to 1, which requires the snapshots to be a 100% match. With the above example the two can be a 98% match and still pass.
Tip – Image size matters
While it makes sense when you think about it, it may not be readily apparent that you need to have two images of the same height and width. If they differ in overall size, the assertion will fail immediately without doing the pixel by pixel comparison.
Tip – Device & machine matters
Snapshots need to be taken on the same device, os, scale and gamut as the one it will be run against. Different devices/os may have differences in color, and my team even saw issues where the same simulator, os, and xcode version had snapshots of a slightly different size when generated on two different developer machines of same make and model – but were different years and slightly different screen sizes.
Reference Snapshots
As mentioned, reference snapshots are recorded on the first run of the test. On subsequent runs, the reference snapshot will be used to compare against new runs. If the elements change, it is easy to update them.
Snapshot Reference Storage Location
The snapshot references are stored in a hidden `__Snapshots__` folder that lives in the same folder in which the snapshot assertion was called.
For example, if my file directory looks like this:
If the functions that call `assertSnapshot` live in the `BaseTest.swift` file, the `__Snapshots__` folder will also exist under `Tests`. The snapshots themselves are then sorted into folders based on the class that called them.
Example: Finder view of the folder structure showing a full screen snapshot taken from a test found in the BaseTest class.
The snapshots will be named according to a pattern, depending on if they are full screen or specific element snapshots:
Specific Element: <functionCallingAssertion>._<snapshotted element>.png
Full Screen: <functionCallingAssertion>.<#>.png
// Take snapshot of specific element
func snapshotElement() {
let homeScreenLoginField = app.textFields[“login-field”]
assertSnapshot(matching: homeScreenLoginField, as: .image())
}
// Take snapshot of whole screen
Func snapshotScreen() {
let screenView = app.windows.firstMatch
assertSnapshot(matching: screenView, as: .image())
}
Given the examples above, the file names resulting of each would be:
Specific Element: snapshotElement._homeScreenLoginField.png
Whole Screen: snapshotScreen.1.png
Update Snapshots
There are two ways to update snapshots with this tool – with a global flag or a parameterized flag.
// Pass in update parameter
assertSnapshot(matching: someXCUIElement, as: .image, record: true)
// global
isRecording = true
assertSnapshot(matching: someXCUIElement, as: .image)
For the project where we implemented snapshot testing, we utilized the global flag to allow for snapshots to be generated in CI; otherwise, we used the parameterized variant for updating specific test/device combinations.
Tip – Tests With Multiple Assertions
Consider carefully if you are thinking about having multiple snapshot assertions in a single test. This is not something I would recommend, largely based on the difficulty of updating the reference snapshots.
One hiccup we ran into with using snapshots was attempting to update them when a test had multiple `assertSnapshot()` calls. If you use the parameterized flag, the test will always stop at the first assertion where that flag is active to make the recording. Then you have to toggle it off, run it again for the next one and so on.
This behavior is worse if you use the global flag, as once it hits the first assertion it will stop the test to take the screenshot, and never continue to the other assertions.
Snapshots on multiple devices
With this framework, you can simulate multiple device types on a single simulator by default. However if you find the need to run a full suite of tests on multiple simulators, you’ll need to extend the code to provide a way to do that – otherwise any simulator aside from the one that the reference shots were taken on will fail.
Resolving this on our team was actually pretty easy – my teammate Josh Haines simply overloaded the `assertSnapshot()` call to pass in the device and OS versions to the end of the file name so that it always checks the snapshot associated with a specific device.
/// Overload of the `assertSnapshot` function included in `SnapshotTesting` library.
/// This will append the device name and OS version to the end of the generated images.
func assertSnapshot<Value, Format>(
matching value: @autoclosure () throws -> Value,
as snapshotting: Snapshotting<Value, Format>,
named name: String? = nil,
record recording: Bool = false,
timeout: TimeInterval = 5,
file: StaticString = #file,
testName: String = #function,
line: UInt = #line
) {
// Name will either be "{name}-{deviceName}" or "{deviceName}" if name is nil.
let device = UIDevice.current
let deviceName = [device.name, device.systemName, device.systemVersion].joined(separator: " ")
let name = name
.map { $0 + "-\(deviceName)" }
?? "\(deviceName)"
SnapshotTesting.assertSnapshot(
matching: try value(),
as: snapshotting,
named: name,
record: recording,
timeout: timeout,
file: file,
testName: testName,
line: line
)
}
The end results of this is an updated file name based on the simulator it’s taken on: <functionCalling>._<snapshotElement>-<device name>-<OS version>.png
For example, something like `testSnapshotTesting._snapshotElement-iPad-Pro-12-9-inch-5th-generation-iOS-15-5.png`
Triaging Failures
When a snapshot test fails, three snapshots are generated: reference, failure, and difference. Difference shows the reference and failure shots layered over the top of each other highlighting the areas where pixels differ. In order to see these snapshots, you need to dig into the Test Report and look at the entry right before the test failure message.
Viewing Snapshot Failures
In a local run, right click on the test that has failed and select `jump to report`. In a CI run, download and open the .xcresult file in Xcode.
Expand the test to see the execution details
Expand the “Attached Failure Diff” section (found right before the failure)
Example: Where to find the snapshots to triage for failures
When looking at the `difference` image, you’ll see the overlay of reference over failure to show where discrepancies are. It can be difficult to parse, but if you look at the example below, the white parts are roughly where the pixels differ while the matches are dark.
Example: The difference image showing the diff of the simulator clock
Check Swift Snapshot Testing’s GitHub repo, or get a tour of it from its creators in some unit tests at their website, pointfree.co.
Stephanie is a Staff Engineer and has worked in test for over 10 years on game, web, mobile, OTT and backend projects. In her spare time she likes to travel, teach, and write fiction. Since writing this post, Stephanie has taken another opportunity outside of WillowTree but the content here is too good to not share!