Everything You Ever Wanted to Know on Sequence & Collection

Soroush khanlou sequence collection header

Everything You Ever Wanted to Know on Sequence & Collection

by Soroush Khanlou

Mar 20 2017

Introduction

My name is Soroush. I am here to tell you everything you ever wanted to know about Sequence and Collection.

When we’re working with Swift, we need an ordered list of elements. 99 times out of 100 we need to reach for an array. But array and all the other Collection protocols in or Collection objects in Swift are built off of a well thought out hierarchy of protocols, associated types and various other components that add to functionality that we use and we take for granted day to day. Here I discuss those protocols and how we can hook into them to build the features that we want at the level that we want.

Almost Everything You Wanted To Know About Sequence and Collection

Everything is built on top of a protocol called Sequence. Sequence provides the backbone of many of the things that you’re used to working with when you work with arrays. For example, when you use map or filter, when you find the first element passing some test on Sequence, that is all defined on this protocol called Sequence. It’s the simplest thing, and everything is built on top of Sequence. The rest of the protocols layer on top of each other like a ladder. We’re going to work our way through part of that ladder. Next, we will discuss Collection and Bidirectional Collection (but we will not discuss: Random Access Collection, Range Replaceable Collection and Mutable Collection).

Sequence

We’re going to start from the bottom level: Sequence (very straight forward). Sequence is a list of elements. It has two important caveats: 1, it can be either finite or infinite, and 2, you can only ever iterate through it one time. Sometimes you will be able to iterate it more than once, but your not guaranteed to be able to iterate it more than once.

protocol Sequence {
    associatedtype Iterator: IteratorProtocol
    func makeIterator() -> Iterator
}

The Sequence protocol has two components. One is an associated type, which is the iterator. That associated type has to conform to the Iterator protocol, and it has a function which constructs an Iterator for us and that function has to be the same type as the Iterator that we declared.

IteratorProtocol

For the iterator protocol, we need to dive one level deeper to iterate a protocol looks very similar to Sequence. It has an associated type element and that element is the type of the thing that you’re going to be pending or the type of the thing that you’re going to be iterating over, and it has one function called next, which returns the next element and mutates that Iterator.

protocol IteratorProtcol {
    associatedtype Element
    mutating func next() -> Element?
}

The Iterator protocol is what Sequence is built on and Sequence provides the backbone for all the stuff that we work with. To examine a LinkedList, we’re going to build one. A LinkedList is nice because it naturally fits into the structure, for Sequence, looking at an item and then looking at the next one, and the next one… and the next one after that.

Example: LinkedList

Here’s an example of a LinkedList (if you haven’t been studying up for your interviews). You have your first element - that points to the second element - that points to the third element, which points to the end. To define a Sequence or to define a LinkedList in Swift, there’s several ways to do it, but the way that I like to do it is with an enumeration. An enumeration is generic over a type T and that’s the type that our LinkedList is going to hold. There’s one extra thing: this indirect keyword. Indirect means I’m going to use the LinkedList node type inside myself (please don’t freak out if I do that. Please let me do that!).

Get more development news like this

There’s two cases when we’re working with this LinkedList. One case is where you have a value, and if you have a value that’s going to of type T. You’re guaranteed to have a next value, and the next value is going to be the exact same type as the whole LinkedList. It’s going to point to another element, and then when you eventually go through all the elements and you hit end, that signifies that the LinkedList is over.

But we can’t do anything with it yet: we can’t enumerate over it, we can’t use for n (which is like a for loop), we can’t use mapping, filtering. We want to conform our LinkedList to Sequence that we can begin to use those tools.

To do that we have to conform to Sequence, and we have to make this function called make Iterator, but we don’t know what our Iterator type is going to be yet. You’ll notice we didn’t add the associated type, the Iterator type. Swift will let us if we put a type there, it will figure out which type it should use. We know we’re going to need some type of object to do the iteration.

We’re going to build our LinkedList Iterator. That LinkedList Iterator is generic over T, which means that every time you call next it’s going to return a value. That value is going to be of type T - you can put anything that you want in your LinkedList. Our Iterator also conforms to the Iterator protocol that we can satisfy it through constraints of the Sequence type. Because an Iterator steps through the Sequence, it has act as a cursor, it has to have some state that it knows where in the Sequence it is. That state will be represented by current, which is going to be a current node of our LinkedList that we’re pointing at. That node is also going to be generic over T, like our Iterator.

Then, we can start to implement our function next, which is going to return an optional T. You return your value until you return nil. Once you return nil, that signifies the end of the Sequence and no more values should be returned. It should only return nil after that. Because we have an enum, there’s not many things we can do. We need to break it open and see what’s inside.

To do that we use a switch statement. We switch our current LinkedList node, and we can have two possible cases. One case is very easy: if it’s the end of the LinkedList, we need to return nil. If we have a value that means we have an element and we have a next. We know that we want to return that element (that’s easy). We can say return the element, but what do we do with that next value? That next represents the next element that our LinkedList node points to. We can update the variable current and set that equal to next. Doing this will make it the next time next is called when it switches on current, the variable current, current will be updated to the next value and it will return the new value or it will end the Sequence by returning nil. That’s everything that we need to conform to Sequence. We now know what that type of our Iterator is, a LinkedList Iterator, and we can put that there and we can set it to start at the head of the LinkedList. self will be the head of the LinkedList, and it will start at the beginning. This is everything you need to do to conform this type to the Sequence type.

indirect enum LinkedListNode<T> {
    case value(element: T, next: LinkedListNode<T>
    case end
}
  
extension LinkedListNode: Sequence {
    func makeIterator() -> LinkedListIterator<T> {
        return LinkedListIterator(current: self)
    }
}

struct LinkedListIterator<T>: IteratorProtocol {

    var current: LinkedListNode<T>

    mutating func next() -> T? {
        switch current {
        case let .value(element, next):
            current = next
            return element
        case .end:
            return nil
        }
    }
}

With the Sequence type, you now are able to iterate over it with a for loop, use map, filter (and all of the extra stuff that comes with Sequences… which is a lot of stuff!).

Let’s take a graphical look at our Iterator. We have our LinkedList from before. It has three elements.

let iterator = LinkedListIterator<String>(current: linkedList)
print(iterator.next()) // => “A”

When we create out Iterator it has a pointer to the first value which is that current. Next, we set up our current to access the head of our LinkedList. When we call iterator.next it will return A. Then, it will all update that reference current to point to the second element in the LinkedList.

let iterator = LinkedListIterator<String>(current: linkedList)
print(iterator.next())
print(iterator.next()) // => “B”

If we call next again, it will return B and move the cursor and point to the next element in the LinkedList. It’ll happen again until it hits the end, where it returns nil, and every time you call next after that it will return nil. But what if we want to add something to all Sequences?

Sometimes you have a bunch of objects and you want to know how many of those objects pass some test. This is a great example. We use a filter to get an array of all the elements that passed the test, and then we call count on it. It’s not bad, but we are creating an extra array, get the count from it and then immediately throw it away. That’s not ideal. I would like to write users.count and then pass it a test. This is more expressive, and if you’ve written any Ruby, this is very familiar from the Enumerable Module. This is much better and I would like to be able to add this function to all of my Sequences.

First, we need to open up the Sequence type with an extension. We know we’re going to be adding a function to it, and that function will be called count. That function is also going to return an integer. But we also have to pass a parameter. I call this parameter shouldCount because it will make it easier to read. The one important thing to note is that the type of the element that’s inside of the Sequence can be referred to with iterator.element. That’s how you know if you’re holding out your string or user objects or whatever objects you have. With our Sequence we know we’re going to need to iterate over something, and this is the thing that you get by conforming to Sequence. You get access to this for n loop. We know we’re going to iterate over the elements and we need to get each element and hold on to it.

We also know that we’re going to need a count. We’re going to hold on to a count variable. It’s going to start at zero and it’s going to be returned at the end. What happens in the middle of the for loop is the magic. If we shouldCount the element, if that test passes and in the above case, if the user is an admin, then we know we need to add one to the count. Pretty straight forward.

let numberOfAdmins = users.filter({ $0.isAdmin }).count // => fine

let numberOfAdmins = users.count({ $0.isAdmin }) // => great

extension Sequence {

    func count(_ shouldCount: (Iterator.Element) -> Bool) -> Int {

        var count = 0

        for element in self {

           if shouldCount(element) {

               count += 1

           }

        }

        return count

   }

}

This is how you add an extension to all Sequences. You can open it up, like any other type, add an extension to it and refer to the type inside the Sequence by iterator.element. It can do anything you want in there. This is a very useful extension; I add it to almost all my projects and maybe one it will be in the Swift standard library.

Another useful addition to Sequence is (something that I call) “Each Pair”. This takes every pair of consecutive elements and groups them together in a tuple. This is useful if you have a Sequence of numbers and you want to know the differences between numbers. You can group them into pairs and then subtract the two pairs from each other. This is also useful if you have multiple views and you want to add an auto-layout constraint in between the views. You can have a view, and then this view, and then another view. You can work with the views in pairs, add a constraint between them, and then work with the next pair.

zip(sequence, sequence.dropFirst()) // Sequence<(T, T)>

Let’s take a look at how that would be implemented with a standard library. In the standard library, we start with a Sequence. To get the effect that we want of each pair, we need to create a copy of that Sequence with another reference and drop the first element from that copy. Next, we’ll zip them together, which will combine the elements that are the same and that last element will fall off. Once we have these pairs, these columns ready to go. We will pair the elements together and this is where the magic happens. Boom! Now we have our pairs grouped together using this expression zip with the two Sequences. But I would like to put it as a method on Sequence that I can do a chain. I can say Each Pair and then do map, do filter, and all the other stuff I want to do.

extension Sequence 
  
  where Self.SubSequence: Sequence {

  Self.SubSequence.Iterator.Element == Self.Iterator.Element {
  
    func eachPair() -> AnySequence<(Iterator.Element, Iterator.Element)> {

      return AnySequence(zip(self, self.dropFirst()))
  
  }

}

We can start with the approach we had last time. We have a function, called Each Pair, and it returns a Sequence of tuples, of two elements this time. We have our zip with all the extra stuff inside. There is an extra component called AnySequence, which is a type eraser (I recommend watching Gwen Weston talk about type erasers). The problem is that when we try to compile this, it doesn’t work. The error the Swift compiler gives us is argument Self.SubSequence does not conform to expected type Sequence. Sequence has another associated type that you normally don’t have to worry about. This one’s called SubSequence. Because of a limitation of the Swift compiler, you can’t express that every SubSequence must also be a Sequence. It’s up to us to implement that constraint ourselves.

We add a constraint that says” I know I’m going to have a SubSequence and I want to guarantee that that SubSequence will also be a Sequence”. We try to compile this and we run into another error. Cannot convert expression, return expression. It’s boring, but the important part is that the second part Self.SubSequence.Iterator.Element doesn’t match up with Self.Iterator.Element. That is, we have a SubSequence and we know that SubSequence is also a Sequence, but we don’t know the type inside the SubSequence is the same as the type inside ourself. We have to add one more constraint for that.

We Self.SubSequence.Iterator.Element equals Self.Iterator.Element. This makes it compile. Now we can use Each Pair as we want, but dealing with these associated types is part of the pain of working with Swift’s protocol and Collection system. The good news is that you can get the result that you want. That is Sequence in a nutshell.

Collection

After Sequence, we go up one level to Collection. Every Collection inherits from Sequence and Collection fixes those two problems that we have with Sequence. Every Collection will always be finite. That means that you will always know how many elements are in there. It can never be infinite, and you can iterate that Collection as many times as you want. With Sequence you can only iterate once, but with Collection you can iterate it over and over and over again which is great.

Let’s take a look at the Collection protocol.

protocol Collection {
  associatedtype Index: Comparable
  var startIndex: Index
  var endIndex: Index
  subscript(position: Index) -> Iterator.Element { get }
  func index(after index: Index) -> Index
}

The main thing that’s added is a new associated type called Index. This Index, if you’re working with an array, you can think of it as an int. Every Collection has one. A dictionary type would have its own index, and usually you don’t have to work with it or worry about it, but it is there under the hood. Once we have an index, you need a startIndex and an endIndex, to tell the system where to begin and where to end. With an array, the startIndex will be zero, but with an array slice, it may be somewhere else. Next, you want to be able to get the element at that index. We use the subscript function. You need to be able to get the index after the current index that you’re currently looking at. You don’t have to deal with Iterators. Swift does it all for you.

Here’s an example of how you might implement forEach on a Collection.

func forEach(_ block: (Element) -> Void) {
    var currentIndex = self.startIndex
    while currentIndex < self.endIndex {
        block(self[currentIndex])
        currentIndex = self.index(after: currentIndex)
    }
}

You don’t have to implement this; it’s built in for you, but here’s how you could do it. You start with the current index and you check if you’re currently less than your endIndex. This is why the index associated type has to be comparable. You check if it’s less than endIndex. You call the block with the current value that you’re looking at and then you update that current value to look at the next index. You keep going down until you hit the endIndex and then the iteration will terminate. That’s how you get all the stuff from Sequence for free from implementing these four functions.


extension APIErrorCollection: Collection {

    var startIndex: Int {

        return _errors.startIndex

    }

    var endIndex: Int {

        return _errors.endIndex

    }

    func index(after index: Int) -> Int {

        return _errors.index(after: index)
    }

    subscript(position: Int) -> APIError {

        return _errors[position]

    }

}

Usually I don’t build Collections from scratch, but I have some type that I want to act as a Collection. In this case I have API error Collection. This has an internal array property of API errors, but the code that I want to write, I want to treat the error Collection as a Collection. I can’t do that because the error’s property is private (I can’t look into it). I want to make my API error Collection also be a Collection. I get all this stuff for free.


struct APIErrorCollection {

    fileprivate let _errors: [APIError]

}

extension APIErrorCollection: Collection {

    // ...

}

errorCollection.contains({ $0.id == "error.item_already_added" })

    // compiles!

To do that we need an extension, or we need to conform our API error Collection to Collection. Because we already have an internal property that conforms to all of the Collection, we can forward everything along. startIndex becomes errors.startIndex and index becomes errors.endIndex and on. Index after gets forwarded, and subscript gets forwarded. It’s straight forward. I do this to make things that aren’t arrays act like arrays. Now our code that we want to check if some error is contained within that Collection works. We get Sequence. We get all for free - mapping, filtering, from forwarding these four properties.

Bidirectional Collection

Bidirectional Collection is very similar to a Collection, except it adds one more thing. It inherits from Collection in the same way that Collection inherits from Sequence, but Bidirectional Collection can go backward as well as forwards. To add the ability to go backwards, you need one more function, and in Collection we have this function index after. In Bidirectional Collection we’re going to add a new function called index before, and that’s going to let us step through the Collection in the reverse order.


protocol Collection {

  //...

  func index(after index: Index) -> Index

}


protocol BidirectionalCollection: Collection {

  func index(before index: Index) -> Index

}

One example of something that you get for free with Bidirectional Collection is this property last. This will give you the last element in a given Bidirectional Collection Collection, or it’ll give you nil if the Collection is empty. We couldn’t implement on Collection because we would have to walk all the way through to the end and then return that last element. It would take too long. We want to go from the end back one step and return that value. We need to check if the Collection is empty. If it is empty we can simply return nil and we know we’re done. Then we want to grab our endIndex, and grab the index before it. That’s the next of the last element. Finally, we return the value at that index.


var last: Iterator.Element? {

    guard !self.isEmpty else { return nil }

    let indexOfLastItem = self.index(before: self.endIndex)

    return self[indexOfLastItem]

}

Bidirectional Collection let’s you do reversing easily because you can step through backwards. It adds functionality.

Other protocols

There are three other protocols that you can look at:

RandomAccessCollection gives you faster access to the values. Instead of having to go through one by one, you can jump straight to the element that you want
RangeReplaceableCollection allows you to replace a chunk in the middle of a Collection with something else
MutableCollection allows you to set as well as get those values

If you take a look at the Swift documentation or Swiftdoc.org, you can check these protocols, see what new functions you need to implement to get that behavior and, hopefully, you can understand those protocols with some of the stuff that we’ve learned here.

Q & A

Q: If you have this Iterator type in Swift and it adds expressiveness, and where is some of that expressiveness that gets added? Soroush: Most the time, the Iterator type is used internally when you’re building something or the Swift standard library will use it internally. You don’t need to worry about it, but sometimes you can use it to step through something manually. If you’re stepping through something manually and, let’s say you’re swiping to a couple of different pages, each time you could hold on to an Iterator instead of holding on to a Collection. Instead of holding on to the index, you hold the Iterator and every time you swipe it, you call next and that will show you the next element in the thing and then the next swipe calls next again and on, and in that way you can use the Iterator type to increase expressiveness.

Next Up: Complete Your Collections Knowledge by Learning Realm Collections

About the content

This talk was delivered live in March 2017 at try! Swift Tokyo. The video was recorded, produced, and transcribed by Realm, and is published here with the permission of the conference organizers.

Soroush Khanlou

Soroush Khanlou is a New York-based iOS consultant. He’s written apps for the New Yorker, David Chang’s Ando, Rap Genius, and non-profits like Urban Archive. He blogs about programming at khanlou.com, mostly making fun of view controllers. In his free time, he runs, bakes bread and pastries, and collects suitcases.

Twitter

4 design patterns for a RESTless mobile integration »