Real World Swift Performance

Real World Swift Performance

by Danielle Tomlinson

Oct 7 2016

Lots of things can make your application slow. In this talk we’re going to explore application performance from the bottom. Looking at the real world performance impact of Swift features (Protocols, Generics, Structs, and Classes) in the context of data parsing, mapping, and persistence, we will identify the key bottlenecks as well as the performance gains that Swift gives us.

Introduction (0:00)

I’m going to talk to you about Swift performance. When building software, especially mobile software where people are distracted and busy, they want to get on with their life rather than deal with your application. It’s frustrating when apps are slow, and don’t do what users want when they want to do it. So it’s important to make your code fast.

What makes code slow? When you’re writing code and choosing your abstractions, it’s important to be cognizant of the implications of those abstractions. I’m going to start out by talking about how to understand the trade-offs between opting-in to dynamic versus static dispatch and how things will be allocated, and how to choose the best fit for you.

Allocation (1:02)

One of the biggest, and often unavoidable costs in code, is object allocation and deallocation. Swift will automatically allocate and deallocate memory for you, and there are two types of allocation.

First is stack-based allocation. When it can, Swift will allocate memory on the stack. The stack is a super simple data structure; you push from the end of it and pop from the top of it. Because you can only mutate the ends of the stack, you can implement it by keeping a pointer to the end, and allocating and deallocating memory from it is a case of reassigning that integer.

Then we have heap allocation. This allows you to allocate memory with a far more dynamic lifetime, but requires a much more complex data structure. To allocate memory on the heap you have to lock up a free block in the heap with enough size to hold your object. You find the unused block, and then you allocate that, and when you want to deallocate memory you have to search for where to reinsert the block. That’s slow. Mostly because you have to lock and synchronize things for the purpose of thread safety.

Reference counting (2:30)

We also have reference counting which is slightly less expensive, but happens way more often, so it’s still a pain. Reference counting is the mechanism used in Objective-C and Swift to determine when it’s safe to release objects. These days, reference counting is forced to be automatic in Swift, which means it’s much easier to ignore, however, then you work in instruments where your code is being slow, and you see 20,000 calls to Swift retainers with release, taking up 90% of the time.

Get more development news like this


func perform(with object: Object) {
	object.doAThing()
}

That’s because if you have this function that takes an object and does a thing with the object, the compiler will automatically insert a retain-and-release so the object doesn’t go away during the lifetime of your method.


func perform(with object: Object) {
	__swift_retain(object)
	object.doAThing()
	__swift_release(object)
}

These retains-and-releases are atomic operations, and therefore have to be slow. Or rather, we don’t know how to make them much faster than they already are.

Dispatch and objects (3:28)

Then we also have dispatch. Swift has three types of dispatch. It will inline functions where possible, and then there’s no additional cost to having that function. The calls are just there. Static dispatch through a V-table is essentially a lookup and a jump and takes about one nanosecond. And then dynamic dispatch that takes about five nanoseconds, which isn’t really a problem when you only have a few method calls, but if you’re inside a nested loop or performing thousands of operations then it starts to add up.

Swift also has two main types of objects.


class Index {
	let section: Int
	let item: Int
}

let i = Index(section: 1,
				item: 1)

You have classes, and with a class everything will be allocated on the heap. As you can see here, we have a class that is an index. It has two properties, a section and an item. When we create that, the stack has a pointer to the index, and on the heap we have the section and the item.

If we make another reference to that, then we have two pointers to the same area on the heap and its shared memory.


class Index {
	let section: Int
	let item: Int
}

let i = Index(section: 1,
				item: 1)

let i2 = i

We’ll also automatically insert a retain for your second reference to the object.


class Index {
	let section: Int
	let item: Int
}

let i = Index(section: 1,
				item: 1)

__swift_retain(i)
let i2 = i

Structs (4:57)

A lot of people will say that structs are the easiest way to write fast Swift code, and that’s generally a good thing to have because they’ll try to put things on the stack and you can often have static or inline dispatch.

Swift structs have three words on the stack for storage. If your struct has three or fewer properties then the struct’s values will generally be inline on the stack. A word is the size of a built-in integer on your CPU, and it’s the chunks that the CPU will work in.


struct Index {
	let section: Int
	let item: Int
}

let i = Index(section: 1, item: 1)

As you can see here, when we create the struct, the index struct with the section and the item the values are directly down on the stack and no extra allocation has occurred. What happens when we assign things elsewhere?


struct Index {
	let section: Int
	let item: Int
}

let i = Index(section: 1, item: 1)
let i2 = i

If we assign P-2 to P, we simply copy the values that are already in the stack again, and we don’t share the reference. This is where we get value semantics from.

What happens if we have a reference type in our struct? A struct has the inline pointers.


struct User {
	let name: String
	let id: String
}

let u = User(name: "Joe", id: "1234")

Then when we assign it elsewhere we have the same pointers shared across the two structs and we have two retains to the pointers, rather than a single retain on the object.


struct User {
	let name: String
	let id: String
}

let u = User(name: "Joe",
			 id: "1234")
__swift_retain(u.name._textStorage)
__swift_retain(u.id._textStorage)
let u2 = u

This is more expensive than if you had a class, for example, with the same result.

Abstractions (6:59)

As we said before, Swift provides a lot of different abstractions that allow you to make your own decisions about how code should be run and its performance characteristics. No we’re going to look at how this applies to actual code. Here’s some simple code:


struct Circle {
	let radius: Double
	let center: Point
	func draw() {}
}

var circles = (1..<100_000_000).map { _ in Circle(...) }

for circle in circles {
	circle.draw()
}

We have a circle with a radius and a center. It has three words of storage, and it will be on the stack. We create one hundred million of them and then we loop over those circles and call a single function. On my computer this takes 0.3 seconds in release mode. What happens when the requirements change?

Rather than just drawing circles, our code now needs to be able to handle multiple types of shapes. Let’s say we need to draw lines. We’re super excited about protocol-oriented programming because it allows us to have polymorphism without inheritance and it allows us to think about the types.


protocol Drawable {
	func draw()
}

struct Circle: Drawable {
	let radius: Double
	let center: Point
	func draw() {}
}

let drawables: [Drawable] = (1..<100_000_000).map { _ in Circle(...) }

for drawable in drawables {
	drawable.draw()
}

What we techs like to do is extract that out into a protocol, and the simple change of referencing the array by the protocol now makes the code take 4.0 seconds to run. That’s a 1300% slowdown. Why?

This is because the code that was previously able to be statically dispatched and executed without any heap applications cannot be done. This is because of how protocols are implemented.

For example, here you can see that all we know about is our circle. What the Swift compiler will do is either go though the V-table or inline the draw function directly inside the for-loop.


struct Circle {
	let radius: Double
	let center: Point
	func draw() {}
}

var circles = (1..<100_000_000).map { _ in Circle(...) }

for circle in circles {
	circle.draw()
}

When we reference it by the protocol it doesn’t know whether the object is a struct or a class. It could be anything that conforms to that protocol.


protocol Drawable {
	func draw()
}

struct Circle: Drawable {
	let radius: Double
	let center: Point

	func draw() {}
}

var drawables: [Drawable] = (1..<100_000_000).map { _ in return Circle(...) }

for drawable in drawables {
	drawable.draw()
}

How do we go about dispatching the draw function? The answer lies in the protocol witness table. It’s an object with a known layout that is generated for every protocol performance in your application, and this table essentially acts as an alias for the underlying implementation.


protocol Drawable {
	func draw()
}

struct Circle: Drawable {
	let radius: Double
	let center: Point

	func draw() {}
}

var drawables: [Drawable] = (1..<100_000_000).map { _ in
	return Circle(...)
}

for drawable in drawables {
	drawable.draw()
}

In this code here, how do we actually get to the protocol witness table? The answer is the existential container which for now has three words of storage for structs that fit inside the value buffer and then a reference to the protocol witness table.


struct Circle: Drawable {
	let radius: Double
	let center: Point

	func draw() {}
}

Here our circle fits inside the three word buffer and won’t be referenced separately,


struct Line: Drawable {
	let origin: Point
	let end: Point

	func draw() {}
}

Our line, for example, which has four words of storage because it has two points and my slide is basically source kit service terminated. This line struct requires more than four words of storage. How do we do that? How does that affect the performance of this code? Well, this happens:


protocol Drawable {
	func draw()
}

struct Line: Drawable {
	let origin: Point
	let end: Point
	func draw() {}
}

let drawables: [Drawable] = (1..<100_000_000).map { _ in Line(...) }

for drawable in drawables {
	drawable.draw()
}

It takes 45 seconds to execute. Why does this take so much longer and why is that happening?

A portion of that time is spent allocating the structs because now they don’t fit within that three-word buffer. That storage will be allocated on the heap, but it’s also partially related to how protocol works, again. Because the existential container only has three words of storage for structs or a reference to an object, we also need to have something called the value witness table. This is what we use to handle arbitrary values.

A value witness table is also created and then it has three words of storage for the value buffer, for any inline structs, and it generalizes allocation, copying, destruction and the deallocation of a value or class.


func draw(drawable: Drawable) {
	drawable.draw()
}

let value: Drawable = Line()
draw(local: value)

// Generates
func draw(value: ECTDrawable) {
	var drawable: ECTDrawable = ECTDrawable()
	let vwt = value.vwt
	let pwt = value.pwt
	drawable.vwt = value.vwt
	drawable.pwt = value.pwt
	vwt.allocateBuffAndCopyValue(&drawable, value)
	pwt.draw(vwt.projectBuffer(&drawable)
}

What we have here is an example of some code and then what will be generated from it. If we just had a draw function, that took a value and then we create the line, and then we pass it to the drawable function.

What actually happens is it passes the drawable existential container and then that’ll be created again, inside the function. It’ll copy the value and protocol witness table, and then allocate a new buffer and copy the value of the other struct, or class, or whatever that object is. Then it’ll use the draw function on the protocol witness table and pass the actual drawable.

You can see that the value witness table and the protocol witness table will be over here on the stack and the line will be on the heap and as will the line drawable.

Modelling your data / Conclusion (12:15)

Simple changes in the way we model data can have a huge impact on performance. Let’s look at some ways to avoid these costs.

Let’s talk about generics. You’re going to say, but you just showed us that protocols could be really slow, why would we want to use generics? The answer comes from what generics allow us to do.


struct Stack<T: Type> {
...
}

Say we have this stack struct that is generic of a T, which is constrained by some type, which would be a protocol. What the compiler will do is replace that T with the protocol or the concrete class that you’re passing to it. Do that all of the way down the function chain and it will create specialized versions of that code that operate directly on the type.

You no longer need to go through the value witness table, or the protocol witness table, and you eradicate the existential container, which could be a really nice way to still write really fast generic code and have the really nice polymorphism that Swift gives us. That’s called static polymorphism.

You can also improve a lot of your data model by using enumerations rather than having lots of strings from the server. For example, if you were building a social network and had a bunch of accounts that needed a status, previously you might have had that as a raw string on the type.


enum AccountStatus: String, RawRepresentable {
case .banned, .verified, incomplete
}

If you had an enum for that then you don’t actually need to allocate anything there and then when you’re passing it around you’re just passing the value from the enum, which can be a really nice way to speed up your code as well as having safer, more readable code throughout your application.

Also, it can be super useful to have actual domain-specific models in the form of u-models or presenters, or various other types of abstraction that allow you to cut out a lot of the cruft that you don’t need in your app.

I think I’ve run slightly short, but thank you very much.

About the content

This talk was delivered live in September 2016 at try! Swift NYC. The video was recorded, produced, and transcribed by Realm, and is published here with the permission of the conference organizers.

Danielle Tomlinson

Danielle hails from England, but is currently embracing jet lag as a way of life. They co-organize NSLondon and ran Fruitconf. They have been building things for Apple platforms for 8 years, but now work at CircleCI and on open source libraries and tools such as CocoaPods.

Twitter

4 design patterns for a RESTless mobile integration »