SourceKit and You

Appbuilders jp simard cover?fm=jpg&fl=progressive&q=75&w=300

SourceKit and You

by JP Simard

Jul 7 2016

SourceKit is more than just crashes & HUDs! It’s a powerful tool that can empower you to be a more productive Swift programmer. In this talk at App Builders 2016, JP Simard looks at a few examples of how you can leverage SourceKit to accomplish powerful tasks simply.

Introduction (00:00)

Raise your hand if you are an Android developer (I have chased them away). This talk is about you, and how you can do things that will empower your development in ways that you could not before, using this tool called SourceKit.

First off, my name’s JP. I am a Canadian transplant in San Francisco. I work on the Objective-C and Swift parts of Realm (a mobile database). We have built Realm from the ground up (data engine, completely built from scratch). We launched it a week after Swift was announced (WWDC 2014). We were about to launch it the week before, but decided to wait a week to see what Apple had. We had no expectation that they would announce this new programming language: we scrambled, and made sure that we supported the language. That brought us to a set of challenges: we were building a product on this, we needed API docs, clean codes, and a way to enforce code style or consistency across the product. And digging around, Xcode 6 internal, whether there was anything we could hack at.

Swift was a more complex language than Objective-C. In some ways, Swift is much simpler than Objective-C (it has a certain set of rules, and those rules are modular and flexible). But there are configurations that were not possible in Objective-C; you have constructs, keywords that do not exist in any other language. Also, it is an extremely flexible language, by composing multiple simple concepts (e.g. you could have types nested within other types, with functions inside, with a local type that is declared in your function, with a closure…). You have capabilities that are not possible in Objective-C, which made Objective-C a simpler language to work with. If you needed to build a custom tool for refactoring or being productive with Objective-C, you could probably get by with a few regular expressions, because you had one level of nesting, class and property declarations, methods. You cannot have classes in classes (or anything complex than you can with Swift).

Ultimately, to do something right with a constantly evolving language like Swift was (and it still is… we heard from Daniel a number of things that are going to be changing in preparation for Swift 3, and it is not showing any signs of slowing down). If you were building a custom tool using regular expressions, trying to manipulate your Swift source to do anything that you need with a large project, you would have a hard time keeping up with all the changes.

Then, we found the SourceKit: the tool that we decided to build of our additional tooling.

The SourceKit You Know (04:28)

SourceKit is more than what you know (aka the thing that crashes and displays a HUD in Xcode whenever you are trying to get your job done): it is a much more powerful tool, and it is hackable. There are many things you could build on top of it that will make your life as a developer more productive.

You have interacted with SourceKit many times, if you have written Swift or used Xcode. The functionality that is powered by SourceKit, within Xcode are (for example): syntax highlighting, or code-completion; when you are tabbing, writing a chunk of your code, and you hit tab and the rest of it fills in magically; code formatting, indentation, interface generations. When you are using an Objective-C API from Swift and you want to look at the interface for that API, it will call through SourceKit, go through the Clang Importer, and do the required transformations. These transformations are getting more complex with some heuristics that determine if there is any repetition in the API (things that are hard to estimate).

Get more development news like this

SourceKit does this heavy lifting, e.g. generating documentation; you are Option-clicking on a token in Xcode, you want to see what is this type, what are the parameters, what do they take, what are the constraints that I can pass in. You do not necessarily see that the interface that Xcode uses to talk to SourceKit is available to you, as a developer. In fact, SourceKit was included in Apple’s Open Sourcing of Swift itself. It is part of the Swift repo - you will see in the tools/SourceKit directory everything that Xcode uses to power its Swift capabilities.

The description below is from the README in the repository:

SourceKit is a framework for supporting IDE features like indexing, syntax-coloring, code-completion, etc. In general it provides the infrastructure that an IDE needs for excellent language support.

– Apple

In general, you could build an IDE on top of SourceKit. And that is essentially what Xcode does.

The SourceKit You Do Not See (07:12)

There is a bunch of SourceKit that you are probably not familiar with, even if you interface with it on a daily basis via Xcode. Under the hood, there is a number of components that it combines and offers via easy-to-use interface:

libIDE
libParse
libFormat
libXPC (Darwin-Only)
libdispatch (Darwin-Only)

There are all these components of the Swift Compiler (the parser, IDE functionality such as code completion, formatting functionality). There is also a number of components that it pulls in, in order to get interprocess communication (XPC) and asynchronous capabilities (libdispatch). Because of these two other dependencies, SourceKit is Darwin-only. It is fascinating that it can compile for iOS, which is also a Darwin platform. We may see IDE for Swift on iOS in the future, but the underlying tooling supports it. This is something that a member of the community could build if they were inclined.

Most of the internals are written in C++ (because it is interacting with LLVM and other aspects of the Swift compiler).

The SourceKit You Interface With (08:28)

If you wanted to use SourceKit to accomplish your own tasks, you have a number of options:

C Interface
sourcekitd.framework
libsourcekitdInProc.dylib
Official Python Binding
Unofficial Swift Binding: SourceKitten

It exposes a C interface, which makes it nice to interact with via Swift. It would be hard to interact with the C++ API, given Swift’s limited to support for C++.

There are two flavors of the framework that you can use SourceKit with: out-of-process XPC version (sourcekitd.framework); and libsourcekitdInProc (“in process”). I am willing to say (you might not agree) that the fact that SourceKit lives out of process in Xcode is a major blessing, because Swift being a complex language, difficult to parse, difficult to cover all the edge cases, SourceKit often crashes (less these days, and it is getting better): but imagine if that was running in process in Xcode (like all of the previous tooling that Xcode had before Swift). It would bring the entire IDE down the whole time that you were hitting an edge case in the Swift syntax (it is a blessing in disguise).

You can also use the official Python binding for SourceKit, included in the repo as part of tools/SourceKit. It is my understanding that Apple uses this Python binding for a number of internal tools. I think they use this for their internal documentation generation, which is a whole stack built on Python.

apple/swift/tools/SourceKit (10:16)

For the rest of the talk, I will be talking about an unofficial binding that I affectionately call SourceKitten, which is a way to interact with SourceKit. Again, the way you interface with SourceKit is via these APIs, and this is what Apple has to say about it:

The stable C API for SourceKit is provided via the sourcekitd.framework which uses an XPC service for process isolation and the libsourcekitdInProc.dylib library which is in process. – Apple

I will focus on using SourceKit through SourceKitten, which is an abstraction layer. It is Swift binding for SourceKit, which over the years has led to some interesting problems. When you build tooling for a language in that language, and not only does the tooling change over time, but does the language… you end up having extra work. But it is still been a fun project. It is this little framework and command-line that you can use to interact with SourceKit.

It is open source, available through Homebrew, and this is how you install it:

$ brew install sourcekitten
...
$ sourcekitten version
0.12.2
$ sourcekitten help
Available commands:

    complete    Generate code completion options
    doc         Display general or command-specific help
    help        Index Swift file and print as JSON
    index       Print Swift docs as JSON or Objective-C docs as XML
    structure   Print Swift structure information as JSON
    syntax      Print Swift syntax information as JSON
    version     Display the current version of SourceKitten

The SourceKit You Build On (11:33)

It gives a number of higher level features that are available in SourceKit C API, but as a programmatic interface in Swift or as a command-line utility, it can do code completion, index generation, structure of the abstract, syntax tree extraction. It can also grab syntax information, code-completion. The purpose of this talk is to go through examples of how you can use SourceKit with fairly minimal effort, to accomplish daily tasks that you might have to do, and sometimes even augment what Xcode’s capable of doing.

A few of the examples that we will go through are: code analysis, analyzing some of the structure and some of the make up of your code. Most will apply for large projects (several thousands of lines, tens, fifty, hundred thousand line Swift projects, which is going to become more and more common), this is where these examples will shine (e.g. large-scale refactoring of Swift code, large-scale migrations of Swift Code). Apple ships Xcode with a migrator for Swift itself, that every time there are Swift changes in the standard library or in the language, Xcode can offer you nice shortcuts to update your code. But there are times that you need to update third party dependencies that do not ship with their own custom migrator - you will be able to use these techniques. Code generation is the last example that we will go over.

Code Analysis (13:01)

For code analysis, I have a handful of examples. We will pop over to Command-Line.

$ cat PublicDeclarations.swift

public struct A {
  public func b() {}
  private func c() {}
}

$ echo $query
  [recurse(.["key.substructure"][]?) |
    select(."key.accessibility" == "source.lang.swift.accessibility.public") |
    ."key.name"?]

$ sourcekitten structure --file PublicDeclarations.swift | jq $query
  ["A", "b()"]

This is using the SourceKit and Command-Line interface. We are running the structure command on a file (here as 1.swift). You have this public struct with a public function and a private function. If you have a massive code base, and you want to get a sense of how large your public API is, if we pipe this through SourceKitten and get the structure, we get this pseudo AST out of it, wrapped in JSON here. There is this substructure, which is a recursive child element; every structure has a sub-structure, if there are any nested elements. At the top level, there is the struct declaration, which is public (SourceKit is already telling us this information). Within it, nested, we have a instance method declaration (again, public): b(). We also have another declaration of instance method, which is of a private ACL level: c(). If we wanted to go through this massive code base, this massive API, and we are trying to estimate over time how large our public API is, or get a consolidated list of all the declarations that we expos, we could parse this.

I am only using Swift, SourceKitten, and this command-line JSON parser called JQ. JQ is a command-line Swift parser - pass it through, we can get some syntax highlighting for it. But if we print out queries, you can use something similar to an XML XPath: “give me, via recursion, trying to find my cursor, via recursion, all the substructures with an accessibility of public,” and print out the name”. If we run this, we will get all of the public declarations in our massive API (which I have consolidated to four lines). In a nutshell, this is one of the things that you can do to process this structure. That is one example of code analysis, get all of the public declarations in your API.

$ cat FunctionsPerStruct.swift

struct A { func one() {} }

struct B { let nope = 0; func two() {}; func three() {} }

$ echo $query
  [."key.substructure"[] | select(."key.kind" == "source.lang.swift.decl.struct") |
    {key: ."key.name", value: [
      (."key.substructure"[] |
        select(."key.kind" == "source.lang.swift.decl.function.method.instance") |
        ."key.name")]}
  ] | from_entries

$ sourcekitten structure --file FunctionsPerStruct.swift | jq $query
  {"A": ["one()"], "B": ["two()", "three()"]}

If we go further, we can start poking at this structure in different ways, e.g. measure the number of functions per struct. We do not care about classes. We want to look at the structs (hard to do via regular expressions; especially with the Swift syntax constantly changing, you have attributes and modifiers). In this case, this is the entire example consolidated. If we run it through SourceKit, try to do a query on the resulting data set, we can get the functions that are nested within our structs, and this would ignore anything that is within a class. If you are trying to do some large-scale analysis of your code base, this can come in handy.

cat LongFunctionNames.swift

func okThisFunctionMightHaveAnOverlyLongNameThatYouMightWantToRefactor() {}
func nahThisOnesFine() {
  func youCanEvenFindNestedOnesIfYouRecurse() {}
}

$ echo $query
  [recurse(.["key.substructure"][]?) |
      select(."key.kind" | tostring | startswith("source.lang.swift.decl.function")) |
      select((."key.name" | length) > 20) |
      ."key.name"]

$ sourcekitten structure --file LongFunctionNames.swift | jq $query
  [
    "okThisFunctionMightHaveAnOverlyLongNameThatYouMightWantToRefactor()",
    "youCanEvenFindNestedOnesIfYouRecurse()"
]

Say you are trying to audit your API for long functions - you could reuse this existing structure. These are things that you can do fairly easily without deep understanding of C++, or LLVM, or the Swift compiler. You access this higher level interface and you can do powerful things with this.

Code Refactoring (17:53)

But, you are a developer: you want to write code, you do not want to read it. We can do is leverage SourceKit to have powerful refactoring tools (refactoring tools have not been far in the last 40, 50, 60 years). With SourceKit, Apple’s provided this higher level tool that we can build to further hack on, or build on.

Below we use the SourceKit in framework, not the command-line interface. To be able to programmatically iterate over not just the structure, but the SourceKit index. Whenever you see Xcode freezing up, and you have this long progress bar that says indexing, and you cannot do anything until that completes… this is the indexing process. It is walking through all of your files in your project and building an index of all of the declarations, all of its APIs, that when you are trying to access it from other files, it knows that it can resolve those symbols. We can access this API directly via SourceKit and via SourceKitten.

import SourceKittenFramework

let arguments = Process.arguments
let (file, usr, oldName, newName) = (arguments[1], arguments[2], arguments[3], arguments[4])
let index = (Request.Index(file: file).send()["key.entities"] as! [SourceKitRepresentable])
              .map({ $0 as! [String: SourceKitRepresentable] })

func usesOfUSR(usr: String, dictionary: [String: SourceKitRepresentable]) -> [(line: Int, column: Int)] {
    if dictionary["key.usr"] as? String == usr,
        let line = dictionary["key.line"] as? Int64,
        let column = dictionary["key.column"] as? Int64 {
        return [(Int(line - 1), Int(column))]
    }
    return (dictionary["key.entities"] as? [SourceKitRepresentable])?
    .map({ $0 as! [String: SourceKitRepresentable] })
    .flatMap { usesOfUSR(usr, dictionary: $0) } ?? []
}

Here we have a short example, within the 30 lines of Swift, where we can build a fairly flexible refactoring tool for the command line. We are passing in: the file that we want to refactor its contents, a unique symbol resolution identifier (which is a unique way to represent some token in Swift), and the old and the new variable name. This does not even have to be a variable (as in this case), it can be any token, a function, a class, a struct, an enum. Our command-line tool has two functions. It will find the uses of this USR (this unique symbol resolution identifier). It will recursively reiterate over the structure this potentially arbitrarily nested structure of your AST, and find all of the lines and columns where we are using this USR. This will return a number of these tuples of line and column. The second function in our 30-line refactoring tool is a tool that will go over all of these locations and replace the use of that specific token with our new name. At the end it will print out the result.

func renameUSR(usr: String, toName: String) {
    let uses = index.flatMap({ usesOfUSR(usr, dictionary: $0) }).sort(>)
    let fileContents = try! String(contentsOfFile: file)
    var lines = (fileContents as NSString).lines().map({ $0.content })
    for use in uses {
        lines[use.line] = lines[use.line]
          .stringByReplacingOccurrencesOfString(oldName, withString: newName)
}
    print(lines.joinWithSeparator("\n"))
}

renameUSR(usr, toName: newName)

If you want to use this, say that the top part is our input file. (imagine a larger project, where we have two properties on two different types that have the same name, or even have the same type signature). If you are trying to refactor this (again, imagine a large project, via regular expressions, or find and replace), you will have a hard time renaming one but not the other.

$ cat CodeToRefactor.swift

struct A { let prop = 0 }
struct B { let prop = 1 }
print(A().prop)
print(B().prop)

$ ./refactor.swift CodeToRefactor.swift s:vV4file1A4propSi prop newProp

struct A { let newProp = 0 }
struct B { let prop = 1 }
print(A().newProp)
print(B().prop)

With this small tool, we can refactor this in a safe and efficient way, in a way that will continue to adapt as the Swift language changes (we are not going to spend much time keeping up this tool with all the language changes).

This is one example of how you can incorporate your own ad-hoc, custom-built refactoring tools, where you do not have to conform to some general purpose refactoring tool that you have downloaded from Xcode plug-ins for instance (useful, but not the end of the road). You can build your own tooling simply, quickly.

Code Migration (22:14)

Imagine you were using this third-party API and it underwent a major renaming of a bunch of types. You could continue doing this, even if those types conflicted with other imports that you are using - e.g. say you are using the Result type, and you are pulling in a library that used a different Result type; you could refactor all of this, or migrate all of this without breaking all the code.

Code Generation (22:44)

Code Generation: this is something of increased interest lately on the Swift mailing lists and in the development forums. On Linux, it is currently required that developers duplicate their functions in some global header. The reason being that XCTest does not have the same reflection capabilities on limits as it does on Darwin on OS X. It cannot automatically detect what all the tests in a XCTestCase subclass are. There is been much discussion as to how to do this. With SourceKit, it is trivial: you can do this without writing any code, passing it through the SourceKit structure.

Here we are getting the classes that have an inherited type of XCTestCase. We are requiring that you subclass XCTestCase - we are not going to catch any of the methods called test on any of your other classes. From there, we are getting all of those substructures that are instance methods (we are not going to catch class methods), and they have to start with “test”. From there we will print all of our results. Here we get this JSON dictionary of classes to an array of functions. And you could very easily build an XCTest manifest for Linux using a handful of lines. If you wanted to build your own test runner, you can use the same approach.

The last example I want to go through is more code generation - say you want to generate some Swift code based off of some existing Swift code. One way that this is quite popular is to represent your model classes or your model structs in type-safe and accessible ways. There is this cool tool out there called QueryKit, which allows you to generate NSPredicates from type-safe queries. You can do things like person.age == 10. And it will generate an NSPredicate that is type-safe that will only work on integer properties (you will not be able to try to do that equality on strings).

$ cat GenerateXCTestManifest.swift
class MyTests: XCTestCase { func nope() {}; func testYolo() {} }

$ echo $query
  [."key.substructure"[] |
    select(."key.kind" == "source.lang.swift.decl.class") |
    select(."key.inheritedtypes"[]."key.name" == "XCTestCase") |
    {key: ."key.name", value: [
      (."key.substructure"[] |
        select(."key.kind" == "source.lang.swift.decl.function.method.instance") |
        select(."key.name" | startswith("test")) |
        ."key.name")]}
  ] | from_entries

$ sourcekitten structure --file GenerateXCTestManifest.swift | jq $query
{"MyTests": ["testYolo()"]}

We write a very short Swift program using the SourceKit in framework that will allow you to generate this code in a short way.

First, we will represent properties and models. Models can have multiple properties, and they each have this swiftSourceRepresentation computed variable, which outputs the equivalent Swift code:

import SourceKittenFramework

struct Property {
    let name: String
    let type: String
     var swiftSourceRepresentation: String {
        return "static let \(name) = Property<\(type)>(name: \"\(name)\")"
    }
}

struct Model {
    let name: String
    let properties: [Property]
    var swiftSourceRepresentation: String {
        return "extension \(name) {\n" +
            properties.map({"  \($0.swiftSourceRepresentation)"}).joinWithSeparator("\n") +
            "\n}"
    }
}

The second part of this code, we are walking through the input file and recursively getting all of these contents, and saying, “for every struct that we have that has properties, let’s get its model name, property names, and property types.” If we then run this on our initial model, we can generate this equivalent static property extension where we can do Person.name and get an equivalent representation of property.

let structure = Structure(file: File(path: Process.arguments[1])!)
let models = (structure.dictionary["key.substructure"] as! [SourceKitRepresentable]).map({
    $0 as! [String: SourceKitRepresentable]
}).filter({ substructure in
    return SwiftDeclarationKind(rawValue: substructure["key.kind"] as! String) == .Struct
}).map { modelStructure in
    return Model(name: modelStructure["key.name"] as! String,
        properties: (modelStructure["key.substructure"] as! [SourceKitRepresentable]).map({
            $0 as! [String: SourceKitRepresentable]
        }).filter({ substructure in
            return SwiftDeclarationKind(rawValue: substructure["key.kind"] as! String) == .VarInstance
        }).map { Property(name: $0["key.name"] as! String, type: $0["key.typename"] as! String) }
    )
}

print(models.map({ $0.swiftSourceRepresentation }).joinWithSeparator("\n"))

This is a very small example, but you can see how this could expand to massive code generation aspect, where you have to duplicate a number of types with only slight variations. You can write it in Swift, use the compiler to help you, and then have variance of that generated via code generation.

$ cat QueryKitModels.swift

struct Person {
  let name: String
  let age: Int
}

$ ./generate.swift QueryKitModels.swift

extension Person {
  static let name = Property<String>(name: "name")
  static let age = Property<Int>(name: "age")
}

This highlights some of the things that you could build on top of SourceKit, but there are others (code formatting, completion, syntax highlighting, documentation generation, real time integrated development environment features). You can gain some inspiration by what some other people have build on top of SourceKit. There is a handful of projects that I work on (e.g. documentation generation for Swift, a tool called Jazzy, a linter called SwiftLint), and there is a number of other tools that other people in the community have been building (e.g. a refactoring tool, the Swift Refactorator). SourceKittenDaemon is an auto-completion for text editors: a back-end that another number of open source projects use for Emacs, Vim, Atom, Sublime Text, TextMate. They have extended the Swift tooling to meet their needs when Xcode is not enough, or when Xcode is heavy-handed.

Next time that you feel that Swift tooling is not up to what you need, remember that you can build your own. Apple has gone out of their way to make sure that they have open sourced SourceKit, and that they advertise it as a way to build IDEs on top of Swift. There is a number of opportunities out there, yours for the taking.

About the content

This talk was delivered live in April 2016 at App Builders. The video was transcribed by Realm and is published here with the permission of the conference organizers.

JP Simard

JP works at Realm on the Objective-C & Swift bindings, creator of jazzy (the documentation tool Apple forgot to release) and enjoys hacking on Swift tooling.

Twitter