Core Data Threading Demystified

Threading in today’s Core Data is radically different from its original implementation, pre–iOS 6. The long history of Core Data has lent itself to different interpretations over the years about how threading should be used, so how should it be done now? In this talk from #Pragma Conference 2015, Marcus Zarra presents the Old Way, the Hard Way, and the Best Way to implement threading.

Introduction (0:00)

The goal of this post is to clear the air about how we should be using Core Data as it stands today, in 2015. We’ll start off with my core truths: the building blocks I use when looking at a framework or at Core Data, and how I approach things. I’ll then discuss the old way, the hard way, and the best way to use Core Data in a multi-threaded environment. Finally, I’ll wrap up with some guidelines and some warnings about several edgy topics in Core Data.

Core Data has been around for about eleven years, yet it still feels like it’s new to me and I’m still learning about it. Eleven years is quite a long time for a framework, especially today with the mobile aspect. One of the problems with dealing with a framework that old is that there are eleven years of blogs and articles talking about Core Data, some of which are no longer accurate. There’s a lot of conflicting information out there.

Threading: Why? (Or Why Not?) (3:16)

When it comes to threading, I always like to ask “Why?” People often come to me and say, “Okay, we’ve got a Core Data problem. We’ve been adding threading to our Core Data application, and we’re in a corner,” and I always ask why they added threading at all. It’s a trick question, but I like asking it because everybody gets it wrong. The answer usually revolves around a performance problem. However, using threading to fix a performance problem tends to create two, four, eight, or twelve more performance problems.

Threading is not a silver bullet. Adding threads to an application is a design decision, and if we make that decision at the 11th hour before shipping, we picked the wrong time. Threading is something we add to an application when we find we have spare time, CPU, and bandwidth. Threading is something that we should be adding to our applications when we want to try to predict what the user’s going to do next and get ahead of them.

An example of that would be the user’s Twitter application, one of my favorite examples. When you launch the Twitter application, we should be using threading to grab images that they haven’t seen yet, because they’re not there on their thread yet. We could even perhaps be caching avatars, or grabbing search results. We should be using threading to predict what the user’s going to do and get there first, so that when they actually go to ask us for it, they’re amazed at how snappy the app is. It’s not really quick, it’s just been working its butt off ahead of you.

We should be doing things in the background for the user, that the user doesn’t need to wait for, like posting a tweet. The user should not really have to wait for that network call to come back. We don’t need to put a spinner on the screen and tell the user, “Hey, my time is way more precious than your time. You will wait and I will tell you if that tweet posted or not. No, you can’t read anything else until I am done.” That’s a poor user experience. Threading should be solving those poor user experiences.

Threading With Core Data: Core Concepts (6:43)

Single Source of Truth

When we’re working with Core Data and we have one or more NSManagedObjectContexts, we need one that we say is the truth, one where the user interface accesses. The user interface is our truth. We develop user-facing applications now, instead of applications sitting on a mainframe. We’re on iPhones, iPads, and personal computers. The user is the one that should be getting the truth at all times. Therefore, we should have a context dedicated to the user and dedicated to giving them the truth.

The “single source of truth” concept is a banking phrase, where you say, “This thing right here is the truth, and whenever we need to confirm which of several possibilities has the right answer, this one wins.”

UI Thread Holds the Truth

All user interfaces are single threaded. Every user interface out there that I’ve ever run across in any language is single threaded. If you’ve ever tried to stand up a UIViewController in a background thread, that does not end well. If the UI is single threaded, then we should be accessing the UI from a single source of truth.

Non-User Data Manipulation Should NEVER Be On the Main Thread

If we’re not servicing the UI, we should not be on the UI Thread. If we’re consuming JSON, we don’t belong on the UI Thread. If we’re pushing data up to a server, whether it be a tweet or a bank transaction, we don’t belong on the main thread. Get off of it. It’s for the user interface only.

Doing just these three things will solve most performance problems with Core Data. No matter which of the three systems that we use with Core Data, we’re not going to have that many performance problems.

The Old Way (9:34)

When I say the “old” way, I mean the original way, the way that we used to do Core Data threading before iOS 6. Apple programming frameworks did a fairly massive concept shift around iOS 5. It had, in my opinion, a lot to do with the huge influx of developers that we got. Before iOS, and before the iPhone was ever public in any way, you could probably count the number of Objective-C developers on one hand. We were all on the same mailing list, we all knew each other, and we generally knew the developers inside of Apple. Apple would release frameworks that had sharp edges, and they didn’t really tell you about them, because they knew you’d find them. Then you would discuss it solve the problem because it was such a small tight-knit community. Threading was like that too; the threading rules were unclear, with different answers from different people. It was still new technology.

Then, suddenly, it wasn’t a thousand or 10,000 developers anymore. From 2006 to 2008, I think we went from 10,000 to half a million developers. I remember WWDCs where the entire labs were completely filled with groups going, “Okay, if you retain, it increments the retain count by one. When you release, it decrements it by one. When it hit zero, it goes away, and if you go to negative one, it crashes.” They needed to go over that again and again, because it was a hard concept. We had this huge influx of people that weren’t already knowledgable about Objective-C, but instead asked things like “So we actually have to maintain memory?” These were just concepts that they never had to deal with.

So, Apple had to do a paradigm shift in how they were approaching the frameworks and APIs. One of those was around threading, because threading is really, really hard. We all get it wrong, even developers inside of Apple. It’s right up there with solving AI as a difficult problem. The paradigm shift revolved around using queues instead of threads. There aren’t any differences except a couple that you never ever run across if you do everything right. Queues are fascinating because they work exactly the same as threads, except for when we write our code wrong, and they protect us from ourselves, which is cool.

In iOS 5, Core Data made a change. They presented three types of ManagedObjectContext (MOC), and can use one of those three. The goal was to help define threading better, but it didn’t work very well in iOS 5. It was flaky and a little unstable. In iOS 6, though, they got the bugs out and it was more stable, so that’s when it actually started working.

Pre-iOS 6, this is how we used Core Data, and it still works today, although you don’t really want to use it often except for certain edge cases. You start off with one PersistentStoreCoordinator (PSC). PSC handles all the interactions to disk. It’s the one that takes the data from disk, realizes that into objects, and passes it back out. We use just one for the application.

Every time we would create a MOC, it would talk directly to that PSC. All of them would interact with the PSC, but they wouldn’t know each other existed. This was problematic hen we wanted to let one context know that we made a change in another context, probably on another thread. We had to handle that through the notification system because every time you made a change to a MOC, it would then broadcast a notification that we had to consume and hand off to the other context. We had to do this correctly.

This became a bit of a problem, because this was a system that was refined over a couple of years. As Core Data first came out, everybody was like just figuring it out on their own. Then, more and more defined rules developed over the years, but they were still confusing.

In this system, we always had one single source of truth for the user interface: the main MOC, the one we would pick to be feeding the user interface. It wasn’t specially defined any other way, because there was only one way to define a context. User interface is not going to come from any other thread, because the user interface is single threaded.

Issues With the Old Way (16:26)

The issue with this design is the large amount of code. The old complaints about Core Data being difficult to use and having a tremendous amount of boilerplate code came from this era and this design. You would, by the end of your application, have Core Data littered everywhere inside of your app.

The threading rules were also unclear. You could literally get different answers from different developers on different days on how you’re supposed to handle the threading of Core Data before iOS 6. It started off with “just lock the context, and you synchronize and everything will be fine.” That didn’t work very well. Then, the answers were, “Okay, one context per thread, and everything that comes out of that context belongs to that thread. You can read on other threads, that’ll be okay, but don’t write on other threads.” Then it was like, “Okay, don’t even read on those threads. Just read on the one thread, write on that thread, everything belongs on that thread, they’re all siloed.” Then people would say it wasn’t thread safe, and then that it was thread safe, but only if you following the siloing rules.

The threading rules were a mess. We had no way to confirm or deny whether we got a threading rule right. You could easily imagine severe arguments during code reviews. Who really knows until it crashes? Even if it crashes, you’re not sure either, because it may not even be crashing right where that code is. Threading is hard.

On top of that, sometimes we would block the wrong thread at the wrong time, and all of a sudden the entire application just halts and you’d have no idea why. Maybe someone was listening to a notification on another thread, and they were trying to consume it on the wrong thread, or their UI was doing something bizarre and causing the entire app to halt. You end up littering your application with NSLogs to hopefully try to capture this so you can figure out exactly how you got to that locking point.

When you start listening to notifications, they get chatty. If you start, you tend to get a little bit overanxious and start listening to the notifications all over the place. This impacts performance, which then causes surprise thread locking, and then you’re going through your application pulling your hair out, trying to figure out why it works 99% of the time, but the other 1% it just stops.

The good news is that this has gotten better in iOS 9. We have a debug flag now that allows us to at least confirm that we got the threading right. We are now able to at least confirm if we got our threading right or wrong.

The old way was difficult and confusing.

The Hard Way (21:53)

The hard way is my personal favorite because it’s academically interesting to me, but hopefully I don’t see it in production very often.

SQLite is the persistence level we tend to use all the time. It’s the one that we use to store everything on disk. It’s designed to have multi-process access so we can have more than one PSC talking to it. We can have multiple threads talking to it at the same time.

If we can have more than one PSC, we could have multiple MOCs talking to each of the PSCs. Now, all of a sudden we don’t have to worry about locking so much, no fear of blocking from one thread to another thread or from one context to another context. We can happily be pushing data in from one direction and consuming data from another direction.

It’s as close as we’re going to get to true asynchronous writing out to the PSC. Of course, there’s still a sliver of a chance of blocking, even if you build the test right, you can still hit a block. The reason for that is because most of the work that a PSC has done is doing up in the CPU. It’s doing up in memory to take our objects and turn them into SQLite calls so that it can prepare and talk to SQLite. That’s where the bulk of a save operation, fetch operation, or read operation is happening. Then, there’s a small sliver actually talking to the database. During that small sliver, if you’re hitting the same table and the same record and the same road at the exact same time, you will get a lock. It’s not 100% true asynchronous, but it’s 99.99%, and it’s getting better and better every year. With WAL mode turned on, and with some of the new stuff that’s been added to the SQLite, it’s getting harder and harder to get that lock to happen. The good news is that we can get super, super close to that true asynchronous.

Even in this design, we still want to have one context that feeds the user interface; we don’t want to have multiples. We always still want to have one that exists on the thread, associated with the user interface.

Issues With the Hard Way (24:34)

The main issue is that it’s a very, very hard way to do this. I don’t recommend doing this unless you have a very specific problem that you’re trying to solve. It’s difficult for us to get it just right, because threading was hard before, and now we’re going to add another level to it. We add that other level because the PSCs don’t talk to each other at all. At least when we had multiple MOCs, we were talking to a single PSC, and that PSC had an idea as to what was going on. The context could at least query it or get some information from them. With multiple PSCs talking to a single SQLite file, we don’t even have that. One is writing to the SQLite file, the other one’s reading, and they have no idea about each other at all. We can get out of sync with our data fairly easily.

Notifications are also quite hard in this system. This has changed a bit in iOS 9, where they added a new feature to allow us to consume remote notifications. However, you can’t just take notifications from one PSC and have another one consume them becuase there is no process for that. We have to do a bit more dancing to get that right.

Threading is even trickier than the first version. We’ve got tons and tons of threading issues going on with this design, as we have to make sure that our PSCs and our MOCs are talking to each other on the correct threads.

Maintainability is just out the window. We’ve taken the original way that we were really confused with, and just added another layer of complexity to it.

Why would we want to do this though? With watchOS, we’d want to do it so that we could actually have multiple processes, such as two processes talking to the same SQLite file on disk. With our glances, there are reasons why we would want multiple applications talking to the same SQLite fil. We eventually need this capability because we’ve got more than one application actually talking to a SQLite file. It’s also good to understand the concept because this is also how iCloud works with Core Data. You can understand why it took a few years for iCloud to become really nice and stable.

The Best Way (27:47)

Last, the “best” way to handle threading. By “best,” I do not mean fastest. If you’re looking for the fastest persistence engine out there, you are not looking for something that works with objects; you’re not looking for Objective-C, Core Data, or anything like that. Object-oriented programming is slow. If you need the fastest, you need to be working with C or something even lower like SQLite. Generally, fastest is going to be some of the ugliest, nastiest code on the planet.

I have written some really fast parsing engines, and I’m really not proud of them. They’re not pretty. There’s a lot of notes in there because I don’t understand them six months later.

In my view, “best” is the easiest to use and also the most maintainable. It is code that I can look at with a cup of coffee, and understand it before I finish the cup of coffee. It is consumable code, where I can trace the bug without having to use a whiteboard.

If debugging is that hard, why would we ever want to write code at the edge of our ability? If I don’t understand it six months later, I’m screwed. I have to start all over again.

In the best way, we go back to having one PSC, but we’re going to use the new APIs in iOS 6. We’re going to add a private MOC that talks to that PSC. Then, we’re going to add our main context and define it as a main context, and we’re going to make that a child of that private MOC. Any data processing will be below the main MOC, so we will have three levels of contexts.

Our main has not changed. We still have one main and it still feeds the UI, except now it’s actually defined as a main, and it can only be used on the UI Thread. If we try to use it on another thread and we have our debug flag on, it will crash, and we’ll know we’re doing it wrong.

This design allows us to have asynchronous saves, which is extremely important. It allows us to save and to consume or process data without blocking the UI. A user can happily scroll through our application, look at data, play with it, and we’re not telling them that they have to wait for us. It’s not a lot of code for us either, since we can stand this up in eight lines of code.

Issues With the Best Way (31:47)

I admit, I was scratching the bottom of the barrel to find issues with this, because it’s currently the right way, so it doesn’t have a lot of issues.

The biggest one you’ll see on the Internet is that it is slower. We have an extra level of indirection between the PSC and the main MOC, so we will get a little bit of slowness there. When I say little bit, I mean if I build up a test case it does thousands upon thousands of iterations, I will find a 1-2% variance in the speed. But technically, yes, it is slower. To reiterate, if you’re looking for raw speed, you should not be looking at Objective-C.

The other problem is that it’s rough for new developers. There are a lot of things that just work, or things that just happen, and you don’t have any code to go with them. A new developer might think, “I saved this data on this NSOperation on this private context over here and the UI updated. How?” There’s no direct link to the code, so it can be rough for developers who don’t understand Core Data. However, to be fair, Core Data is rough for developers who don’t understand Core Data. Core Data comes at persistence from a different direction than most other languages, so it’s rough, period.

It can also be more code. I have run into situations where people get really excited about blocks. They use blocks to an interesting new level. I have seen persistence layers where they have 12,000 lines of code in one class because everything can go into a block. They can be messy that way. It can be more code only because it tempts us. You end up with blocks in blocks in blocks, and then six months later when you go to ship, you think, “Why is the persistence layer one object?” So it can lead to that mistake.

To go along with that, what I call “code puke” is really easy in this system, i.e. this blocks within blocks within blocks within blocks idea, because it’s so easy. Blocks are great, but it’s so easy to just add another block and get another tab off that left margin. What’s one more step off that left margin, right? That’s one of the issues.

Otherwise, there aren’t a lot of issues with it. The slight decrease in speed tends to hang up a lot of people, which baffles me.

Guidelines (27:20)

Single Source of Truth

So many problems with threading and Core Data are instantly solved by having one MOC that feeds our user interface. If you ignore all of my other advice and do just that, you will avoid most of the problems that people have run into with this persistent system.

Do Not Reuse Child MOCs

If you use the “best” system, don’t reuse those child MOCs that are underneath the main MOC. They’re cheap. Use them once, then throw them away. Don’t build up a cache of them, and don’t have them associated with threads so that you can reuse them in a pool or any of the other clever things that I’ve seen. Create them, use them, save them, throw them away. They’re absolutely disposable.

The reason for this is because data changes only go up, not down or sideways. If I’m consuming data in a child of the main, and I save that data, it goes up to the main magically for us, but it doesn’t go back down to any siblings of that child. If I have 10 of them in memory in a pool for me to be able to reuse, they will get out of sync very quickly. Think of them as being snapshots in time. Don’t expect any other changes to magically show up in it.

NSFetchedResultsController WILL Block the Main Thread on a Child MOC Save

If you’re using Core Data, your UI developer will come to you and say, “Your NSFetchedResultsController is blocking my user interface. Core Data sucks.” Then they will pull up Instruments and they will prove to that NSFetchedResultsController is behind all of the stuttering in their UI. When this happen, you just need to turn that little arrow next to the NSFetchedResultsController so you can find out which one of their table view cells is actually causing the performance problem.

It gets blamed a lot. It is usually the thing that people find first when they’re looking for UI performance problems, for the simple reason is that it is the linchpin between a lot of iPad/iPhone UIs and Core Data. When you’re doing bulk changes in Core Data, it’s the one that’s going to be feeding the whole user interface, so it tends to show up in there.

Instruments, Instruments, Instruments

We should be using Instruments a lot while working with Core Data and while working with our UI. We should be making sure that the performance problems are actually where we think they are, not guessing at them by just doing log statements or other clever stuff. Constantly use Instruments while you’re working with your Core Data, while you’re writing your data importers, while you’re writing your exporters, and mainly just to make sure you are not on the main thread. There have been plenty of times where I’ve written code and I’m like “Hey, this thing works really well,” and when I ran it in Instruments, I see that I’m back on the main thread processing a JSON, and I’m blocking the UI. Instruments will protect us against that. It will help us see how much data processing we’re doing on which thread, avoid a lot of those performance problems.

Q&A (41:24)

Q: This class with 12,000 lines of code… how many times per year do these people do a code review?

Marcus: They didn’t. They were a startup in San Francisco doing 80-hour weeks. Nobody was doing code reviews. This is a very common problem in the little startups. It’s a startup mentality of “We have money, we have an unrealistic deadline, and we need to beat our developers to death to make sure they ship by this deadline that we picked before we decided what the app was going to do.” It’s that kind of mentality that was driving them. And unfortunately, it’s one I see fairly often.

Q: So eventually when they got the second round of funding, they decided to make a code review and then…

Marcus: Yes, once we actually shipped. They brought me in for three weeks because their main developer was going on vacation for three weeks, and they were shipping in two weeks. After that, he decided he was going to go work somewhere else. We built a whole new development team and rewrote the app.

Q: You described the “best” way to organize the work with Core Data. I was wondering, when you save your child MOC, should the save event propagate to main context so it eventually gets saved to persistent store? Or, does it only save your changes to your main context, and that’s all, and you wait for some other event to save it to real database?

Marcus: I get to give you my favorite answer in the world for that: It’s a business decision. When we use this design, we are no longer tied to having code decisions on when we can save. It used to be, back in the old ways, like “Ooh, I need to save every 10 records or we’ll feel it in the UI.” If we’ve ever had to do that, it will be a magic number and a #define somewhere going, “Okay, if we save every six, it’s fast enough that we won’t feel the stutter in the table view or something like that.” But, by doing this design, we no longer are limited by that. We’re no longer having to feel IO on the main thread, so we can make that decision as a business decision. It depends on the data at that point. Is the data recoverable? Is it cheap to get again? Then I’ll save later. Maybe I’ll save on exit, or maybe I don’t even care, becuase it’s something like a Twitter feed that I can get again. However, is this is a medical record that does not exist anywhere else on the planet? I’m going to save that right now. It becomes a data/business decision. How valuable is the data? How easy is it to recover that data? If it’s hard to recover, or unrecoverable, we want to save it and back it up and make three copies of it. However, if it’s super cheap, I may save later. I may throw it into the main thread and save it at exit, or when the user is watching a video. How many times have we launched Twitter and then set the phone down? Detect that, and save during that time. It lets us make that a business/user experience decision as opposed to “I must save now because I’m impacting the UI.”

Q: What about if you should have two child contexts which should share changes between themselves?

Marcus: In the interest of complete transparency, you can make them, but it’s a square peg, round hole kind of thing. You can use notifications to force one child to consume updates from the other child, but don’t do this. It’s just a bad idea. It’s way better to just throw them away. If you’ve got a situation where you’re going to have two children, and one’s going to be dependent on the results of the other, create the second one later. They’re so cheap to create, that you can just create them in line. At that point, maybe it’s just one operation, so you can use that same context for both of those pieces. But don’t try to get siblings to share data like that, it’s just heartache.

Q: I’m pretty sure that you are familiar with Realm. I want to know your opinion about Realm and Core Data. Should we use one or the other, or which one is more suitable for you?

Marcus: Realm has a few things about it that I’ll discuss. First, my opinion on third-party code is well known: all code sucks. I think Realm is trying to solve a problem that is the incorrect problem to solve. They’re trying to be faster than Core Data, whereas Core Data is trying to be fast enough, but maintainable. In my playing and working with Realm, I find that the amount of code you write is about equal. Their migrations to me are little bit more voodoo than I’d like. They’re trying to be fast, good for them, but that’s not what I want. As a project leader or a developer, I want maintainability and consistency. My big concern with third-party frameworks is that they go away. It happens over and over again. We don’t know how long Realm’s going to be here. I don’t understand their business model. Core Data for me is good enough; it’s mature, it’s been around long enough, and it’s fast enough. If it’s not fast enough, I’m probably doing something wrong because I’m in object space anyway. There’s a lot of unknowns about Realm. The storage is opaque, for instance, and that make me little jittery. Whereas, for Core Data, it’s a known quantity. Apple’s not going to throw it away tomorrow. The SQLite is transparent. I can look into the data. I can get the data. Even if it does go away tomorrow, I can still look at it. To me, it’s good enough, but then again, it’s my hammer. That’s the thing that I use the most. Is there anything wrong with Realm? No. Play with it, use it. It might be great. But, to me, it doesn’t solve the right problems. It’s not significantly better than Core Data to the point where you’d say, “Wow, this is so much better, why would anybody use Core Data?” Instead, I see it as, “Okay, it’s faster. Awesome. Good for you.” It’s not less code, and it doesn’t have the maturity of Core Data yet. Ask me again in a year, I might change my mind.

Note from Realm: We consider our top design goals to be ease of development and maintainability, not speed. In particular, we’ve spent a lot of time designing what we believe is a much simpler threading model than Core Data. Realm’s Objective-C and Swift layers are available as open-source and our underlying storage layer will follow. For more info about our business model, feel free to check our pricing page, or contact us if you have any questions!

Marcus Zarra

Marcus Zarra

Marcus Zarra is best known for his expertise with Core Data, persistence, and networking. He has been developing Cocoa applications since 2004 and has been developing software for most of his life. Marcus is the author of Core Data: Apple’s API for Persisting Data under Mac OS X and Co-Author of Core Animation: Simplified Animation Techniques for Mac and iPhone Development, as well as one of the authors of the Cocoa Is My Girlfriend blog.