An Introduction to 3D Graphics with Metal in Swift

Video & transcription below provided by Realm: a replacement for SQLite & Core Data with first-class support for Swift!

Warren Moore, a former Apple engineer, recently stopped by the Swift Language User Group to give an introduction to 3D graphics, Apple’s new Metal framework, and how you can use Metal for 3D rendering from Swift. His presentation covered a lot of ground and was a perfect intro for anybody trying to learn more about 3D, game development, and Swift. Thanks Warren!

As usual, video & slides are synchronized. The video is also subtitled. You can find a blog version of the talk (with code samples) below. The code used in his presentation can be found here.

Part One: Rendering Basics

What is 3D Rendering? (2:46)

3D rendering is taking geometric data models of the world and adding perspective, material properties, and lighting. 3D graphics consists of stitching together triangles seamlessly and then painting over them with textures. We craft illusions out of pixels that take advantage of things like depth perception and other traits of the human visual system in order to create things that look plausible.

The Pipeline (5:00)

The term Fixed-Function pipeline means that the hardware is configurable but not programmable. You can set certain states on the GPU that apply, like light parameters or graphic states for textures, but you can’t write your own shaders. What has happened in the last ten years is that now we have a programmable pipeline, where we’ve introduced shaders. Shaders are small programs that run per vertex or per pixel and gives a hook for computing. This involves writing code that runs directly on the GPU.

In order to go from a model to the screen, you pass the object through the vertex shader, which projects it onto the virtual view plane. Rasterization chops up the object into pixels, which in turn are assigned a colour value when run through the fragment shader. What is actually being submitted during a draw call is something like assigning each vertex with a position in 3D space and labelling it with a colour.

Transformations (8:19)

Subsequently, transformations (like translating, rotating, and scaling) move the object around the world. One transformation that’s more involved is the perspective projection transformation. In 3D space, everything that appears on the screen is incased in this volume called the “view frustum”, a truncated pyramidal shape with a near and a far plane. A process called ray tracing introduces foreshortening and essentially creates a 2D image from a 3D figure.

Moving Between Coordinate Spaces (11:37)

There are a number of coordinate bases that we migrate through to get to said perspective projection. Points start out relative to themselves in “model space”; for example, you might describe a point on a chair by the four points of the seat. Moving from the chair to the “world space” of the room, you want to move again to “eye space” to imagine the world as not being attached in the consistent observer. Then, projecting the points onto the near plane moves into “screen space”. Mathematically, this is done through matrix concatenation, where the move from “space” to “space” is associated with its own transformation matrix.

Part Two: Working with Metal

Context & Conventions (15:59)

There are a lot of choices available when writing 3D mobile applications. SceneKit/SpriteKit from XCode are the highest level of abstraction you can get away from the GPU, but it’s not at all flexible. CoreAnimation and CoreGraphics are a little bit lower level but not really used for 3D much at all. Up until now, OpenGL ES has been the premier 3D technology on iOS and is what has been turned to for high performance code. Now we have Metal, which abstracts almost nothing, which means you have to do a lot of work in order to get Metal up and running - which also gives an enormous amount of power and control.

Metal is very protocol heavy. In Obj-C, you would be talking to an ID that conforms to a MTL device; the idea is that you’re talking to an interface and not to a concrete type. Another common concept is a descriptor, which is an object to which you assign properties and is used to create immutable instances.

Devices (20:14)

Devices are abstractions over the GPU in the iOS device that conform to this protocol MTLDevice; they are the root objects you deal with in Metal and help create many other things, including textures, buffers, and pipeline state. Creating one is fairly simple, but in order to get anything on the screen with Metal, you need to interface with UIKit using a specialized sub-class CA layer “CAMetalLayer”. You tell it which device it’s speaking with and give it a pixel format “BGRA8Unorm”, which is basically an 8-bit colour component format.

let device = MTLCreateSystemDefaultDevice()

let metalLayer = CAMetalLayer()
metalLayer.device = device
metalLayer.pixelFormat = .BGRA8Unorm

Drawables are another aspect of the CAMetalLayer, but they don’t really have an analog to anything. A Metal layer can give you a drawable object, which in turn can give you a texture that you can draw into. You access your frame buffer through the “.texture” property, the abstract of which is called swap chain. Metal is triple buffered by default: you can draw onto one surface, copy another to the graphics hardward, and display a third.

Render Pass (23:38)

In order to clear the screen or do any sort of operation, we need to create what’s called a render pass descriptor. It encapsulates the behaviour you want for the frame buffer texture. The following code is an example of configuring a render pass:

let passDescriptor = MTLRenderPassDescriptor() 
passDescriptor.colorAttachments[0].texture = drawable.texture
passDescriptor.colorAttachments[0].loadAction = .Clear
passDescriptor.colorAttachments[0].storeAction = .Store
passDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0.8, 0.0, 0.0, 1.0)

Command Submission Flow (24:43)

In order to the submit work to the GPU, we start with the Command Queue, which is a MTLCommandQueue protocol. It’s a thread safe way of submitting work. In order to actually do anything on the GPU, we submit command buffers consisting of encoded or rendered commands written by the command encoders. The Command Queue is just a serial queue that dispatches work to the GPU in an organized manner. You can submit to a Command Queue across multiple threads because it’s an inherently thread safe object. To create one, use the following code:

commandQueue = device.newCommandQueue()

// To actually write commands into the queue

// Issuing draw calls
// … fixed-function configuration … 
commandEncoder.drawPrimitives(.Triangle, vertexStart:0, vertexCount:3)

// Presenting and committing
// … draw calls … 


Vertex & Pipeline Descriptors (27:53)

The render pipeline in Metal is an actual object, not just an abstract notion. It’s a precompiled set of graphic states, including a vertex and fragment shader/function, with one per pair of shaders. Pipelines are cheap to swap but expensive to create because they take the shader code and compile it down to the target hardware, which is what runs on the GPU.

In order to submit geometry to the GPU, the vertex descriptor tells Metal how the vertices are laid out in memory. The vertex descriptor needs to be associate with other things, including a pipeline descriptor.

// To create a vertex descriptor
let vertexDescriptor = MTLVertexDescriptor() 
vertexDescriptor.attributes[0].offset = 0; 
vertexDescriptor.attributes[0].format = .Float4
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.attributes[1].offset = sizeof(Float32) * 4
vertexDescriptor.attributes[1].format = .Float4
vertexDescriptor.attributes[1].bufferIndex = 0
vertexDescriptor.layouts[0].stepFunction = .PerVertex
vertexDescriptor.layouts[0].stride = sizeof(Float32) * 8

// To create a pipeline descriptor
let pipelineDescriptor = MTLRenderPipelineDescriptor() 
pipelineDescriptor.vertexDescriptor = vertexDescriptor 
pipelineDescriptor.vertexFunction = vertexFunction 
pipelineDescriptor.fragmentFunction = fragmentFunction 
pipelineDescriptor.colorAttachments[0].pixelFormat = .BGRA8Unorm

Libraries & Functions (31:33)

In order to get at the vertex and fragment functions we’ve written, we need an object called the library. The library is a collection of functions that can be retrived by name; the default library consists of all the vertex shaders that have been compiled into your app binary. To create a library and functions:

let library = device.newDefaultLibrary()!
let vertexFunction = library.newFunctionWithName("vertex_func")
let fragmentFunction = library.newFunctionWithName("fragment_func")

An important thing to realize is that the vertex shader gets run infrequently, while the fragment shader gets run a lot. A pipeline state synchronously creates the compiled code that runs on the GPU.

// a simple vertex shader
vertex OutVertex vertex_func(device InVertex *vert       [[buffer(0)]], 
                             constant Uniforms &uniforms [[buffer(1)]], 
                             uint vid                    [[vertex_id]]) 
  OutVertex outVertex; 
  outVertex.position = uniforms.rotation_matrix * vert[vid].position;
  outVertex.color = vert[vid].color; 
  return outVertex; 

// a simple fragment shader
fragment half4 fragment_func(OutVertex vert [[stage_in]]) 
  return half4(vert.color); 

// creating a pipeline state
pipeline = device.newRenderPipelineStateWithDescriptor(pipelineDescriptor, error:error)

Moving to 3D (39:59)

In order to move from rendering animation in 2D to 3D, we need to attach something called a depth buffer. We also need to associate a normal direction with each vertex, which allows us to compute things like lighting. We’ll also introduce a perspective projection matrix, as well as a fragment shader that does extremely basic lighting.

The depth buffer is a texture that’s associated with the render buffer when you’re drawing and allows drawing in any order. The texture represents the distance from the nearest triangle that we’ve seen so far for each pixel, and we allow another pixel to be drawn if it’s closer to the camera. This has the effect of only showing the front most surface, which is a technique used to ensure that things are drawn in a visually consistent fashion.

Basics of Lighting (42:31)

A very rudimentary form of lighting is called diffuse lighting. Diffuse is the term used to describe light that hits the surface and scatters in every direction. The contrast is specular lighting, where the light will be focused into a bright spot. To compute the diffuse, you need only the surface normal and the light direction. The dot product of these two vectors is the intensity of the diffuse lighting term, or the radiance value for that particular point on the surface. In order to do that in Metal shade, we write a vertex function that takes a normal in model space and translates it to world space.

vertex OutVertex light_vertex(device InVertex *vert       [[buffer(0)]], 
                              constant Uniforms &uniforms [[buffer(1)]],
                              uint vid                    [[vertex_id]]) 
  OutVertex outVertex; 
  outVertex.position = uniforms.projectionMatrix * 
                       uniforms.modelViewMatrix * 
  outVertex.normal = uniforms.normalMatrix * vert[vid].normal; 
  return outVertex; 

The fragment shader normalizes the vertex normal, takes the dot product of it and the incoming light direction, then saturates it to keep it between zero and one.

fragment half4 light_fragment(OutVertex vert [[stage_in]]) 
  float intensity = saturate(dot(normalize(vert.normal), lightDirection)); 
  return half4(intensity, intensity, intensity, 1); 

Texturing (50:13)

When texturing, we associate each vertex with a 2D texture coordinate. This repaces the vertex colour with a per-pixel diffuse colour. To load texture data from a UIImage, we leverage CG to draw a UIImage into a bitmap context, then copy pixel data into MTLTexture.

let textureDescriptor = MTLTextureDescriptor.texture2DDescriptorWithPixelFormat(.RGBA8Unorm, 
                                                                                width: Int(width), 
                                                                                height: Int(height), 
                                                                                mipmapped: true) 
let texture = device.newTextureWithDescriptor(textureDescriptor) 
let region = MTLRegionMake2D(0, 0, Int(width), Int(height)) 
                      mipmapLevel: 0, 
                      withBytes: rawData, 
                      bytesPerRow: Int(bytesPerRow))

#### Sampling [(52:04)](javascript:presentz.changeChapter(0,72,true);)  
If you want to get the value of a texel, or a pixel inside a texture, you can index directly into it to get the colour value. Sometimes, you want an abstraction over that, which we call sampling. There's not a one-to-one mapping between the texels and the pixels, so a sampler is an object that knows how to read a texture and interpolate between these texels. 

// creating a sampler state
let samplerDescriptor = MTLSamplerDescriptor() 
samplerDescriptor.minFilter = .Nearest
samplerDescriptor.magFilter = .Linear
samplerState = device.newSamplerStateWithDescriptor(samplerDescriptor)

// a texturing fragment shader
fragment half4 tex_fragment(OutVertex vert           [[stage_in]], 
                            texture2d<float> texture [[texture(0)]], 
                            sampler samp             [[sampler(0)]]) 
  float4 diffuseColor = texture.sample(samp, vert.texCoords); 
  return half4(diffuseColor.r, diffuseColor.g, diffuseColor.b, 1); 

Takeaways (54:23)

Metal presents a slimmer and more consistent API than OpenGL. It requires substantially more effort than high-level libraries like SceneKit, but provides unprecendented access to hardware and real opportunities in the area of performance.

Resources & Q&A (54:59)

Q: What happens to memory once you’ve handed it to Metal? Is it a copy?
Warren: No. In my sample code, I overwrote things without really worrying about it; ideally, you normally keep three separate textures and you never write into memory that’s being read or written by Metal.

Q: Are there performance numbers comparing OpenGL and Metal?
Warren: The marketing talk is ten times more draw calls, but that doesn’t mean anything until you actually write and test an app yourself. I don’t have any specific numbers, but you will spend a lot less time doing things like validating state each frame and recompiling. Metal won’t be faster in every use case, but in a very broad category of applications it will.

Q: Since this is your first experience using Swift and Metal together to prepare this talk, how do you feel about it after you used it?
Warren: Once I figured out the float pointer trick, that was a major revelation. I’m a little bothered by the amount of rigorous typecasting you have to do, but overall I feel like I’m writing less code that’s more stable. I walked away with a really positive impression from this experience of writing more than a few hundred lines of Swift.

Q: Given this API was designed for Objective-C, are there any sorts of things you felt were easier to work with in Swift?
Warren: I’m not sure. It’s easier to write idiomatic Objective-C for an Obj-C API, but there wasn’t a whole lot of friction writing against Metal in Swift.

Q: What kind of performance tooling is there?
Warren: The normal debug side pane in XCode can give you some pretty good insight, but if you want a more in-depth look, Capture GPU Frame is a really amazing tool in XCode. It gives you not only the 60 FPS indicator but also takes a snapshot of all the render calls you’ve made on a particular frame. You can view all the bound GPU objects, which is an amazing way to visually inspect all the objects sitting in shared memory.

Q: OpenGL is a shading language, is there a shading language with Metal?
Warren: All the shaders I’ve showed tonight were written in Metal’s shading language. We basically write client code in Swift/ObjC and then the code running on the CPU is written in the C++ derive Metal shading language.

Q: Can you save shaders as individual units like in OpenGL’s language?
Warren: You can. You have more flexibility in Metal because Metal allows you to have multiple shaders per source, or per compilation unit essentially.

Click to read more

Warren Moore

Warren Moore

Warren is a Cocoa developer and occasional trainer, speaker, and blogger. He works at Apple as a Metal Ecosystem Development Engineer, guiding the next generation of 3D graphics technologies.