Realm Blog

Technical Guest Post: "The things I've learned using Realm"

It’s been exactly two years since I posted the article “How to use Realm like a champ”!

In this article, I try to sum up all the things I’ve learned from using Realm, and how it all relates with the current landscape of Android local data persistence: Android Architecture Components, Android Jetpack, Room, LiveData, Paging, and so on. Stay tuned!

An overview of Realm

Realm debuted as a mobile-first database, and has since grown into its own full-scale data synchronization platform. From the get-go, Realm’s vision was to provide the illusion of working with regular local objects, while in reality enabling safe and performant concurrent sharing of data. On top of that, Realm intends to handle most complexities of concurrency, and when something changes in the database, we receive notifications so that we can update our views, to always reflect the latest state of data. With the Sync Platform, these changes could be committed across multiple devices, and get seamlessly merged, like with any local write. Personally, I’ve used the Realm Mobile Database in multiple applications, and if there’s one major thing it taught me, it is to prefer simplicity over complexity. Sometimes, we might be so used to doing something in some way, that we don’t even think of just how much easier it could be with a slightly different approach.

In case of Realm, that different approach is observable queries, but of course, being able to define classes that are automatically mapped to database without having to think about relational mapping and joins is also a plus.

Out of the box, with Realm, pagination from a local data source is also not a real problem, as data is only loaded when accessed, therefore whether it’s 200 items or 10000, we can’t run out of memory, and the query will generally be fast.

A story of faulty abstractions

It is commonly seen as a good practice to abstract away specific implementations under our own interfaces, so that if need be, they can be safely replaced with a different implementation. And if you ever truly do need to swap out one implementation for the other, it’s definitely helpful that changes are localized to a single “module”, instead of it influencing parts of the code all over the codebase.

However, sometimes we think we’d like the module to behave one way, and enforce a contract that might not necessarily be the best solution to our problem. In this case, our interface restricts us from using a different approach.

In the case of local datasources, the common “mistake” is that if we have a DAO layer, then the DAO must have methods akin to this:

public interface CatDao {
   List<Cat> getCatsWithBrownFur();
}

In which case we’ve hard-coded that we have no external arguments (and is intended to be a constructor argument of the dao, most likely a singleton instance), and that retrieval of data is single, synchronous fetch. Following up on that could be, for example, a Repository:

public interface CatRepository {
   void getCatsWithBrownFur(OnDataLoaded<List<Cat>> callback);
}

In which case we’d ensure that the getCatsWithBrownFur() is most likely executed on a background thread, and will make its callback on the UI thread. The retrieval of data would be a single, asynchronous fetch. The data is loaded in its entirety to memory on a background thread, then passed to the UI thread for it to be rendered.

But is this really the best thing we can do?

What Realm’s queries look like

Observable async queries, change listeners, and lazy loading In Realm, or at least in its Java API, you receive a so-called RealmResults, which implements List. So we might be tempted to just use it as a List and move on. But that’s actually not how it’s intended to be used: Realm updates any RealmResults after a change has been made to the database, and notifies its change listeners when that’s happened.

So in practice, setting up a Realm query on the UI thread involves both defining the query, AND adding a listener to handle any future change that happens to it.

private Realm realm;
private RealmResults<Cat> cats;
private OrderedRealmCollectionChangeListener<RealmResults<Cat>> realmChangeListener = (cats, changeSet) -> { adapter.updateData(cats, changeSet);    // initial load + future changes
};
@Override
protected void onCreate(Bundle savedInstanceState) {
   super.onCreate(savedInstanceState);
   realm = Realm.getDefaultInstance();
     // ← open thread-local instance of Realm
   cats = realm.where(Cat.class).findAllAsync();
     // ← execute Async query
   cats.addChangeListener(realmChangeListener);
     // ← listen for initial evaluation + changes
}
@Override
protected void onDestroy() {
   super.onDestroy();
   cats.removeAllChangeListeners(); // ← remove listeners
   realm.close(); // ← close thread-local instance of Realm
}

For those familiar with Swift, the API is slightly different, but it does the same thing:

let realm = try! Realm()
let results = try! Realm().objects(Cat.self)
var notificationToken: NotificationToken?
override func viewDidLoad() {
   super.viewDidLoad()
   // …
   self.notificationToken = results.observe {
    (changes: RealmCollectionChange) in   // … (handle changes)
   }
   // call `notificationToken?.invalidate()` when needed
} ```

But we’re getting lost in details. What’s different here, compared to the aforementioned common DAO approach? Well, pretty much everything.

The common DAO pattern assumes that we want to call the DAO to refresh our data, whenever we know that any other parts of the application, on any other thread could have potentially changed the data we’re showing. That’s quite the responsibility!

In case of Realm, we can observe the query results — in fact, Realm automatically updates it for us, whenever it’s been changed. In this case, our listener is called: and we even receive the positions of where items have been inserted, deleted, or modified. When we commit a transaction, Realm handles all of this for us in the background.

Another thing to note is that when using Realm’s Async queries, the evaluation of the new result set and the diff happens on Realm’s background thread, and we’ll see the new data along with the changes only when the change listeners are being called.

When we call an accessor on a RealmResults, we actually receive a proxy instance, that can read from and write to the data on the disk, essentially minimizing the amount of data read to memory. This eliminates the need for pagination, even when working with larger datasets.

RealmResults<Cat> cats = realm.where(Cat.class).findAll();
  // sync query for sake of example
Cat cat = cats.get(0);
  // actually `fully_qualified_package_CatRealmProxy`
String name = cat.getName();
  // calling accessor reads the data from Realm

The price of proxies: threading with Realm

Anyone who’s used Realm knows that there is a price to pay with proxy objects and lazy results, which also extends to Realm’s thread-local instances. Namely, that while Realm handles concurrency internally, the objects/results/instances cannot be passed between threads.

final Realm realm = Realm.getDefaultInstance();
Thread thread = new Thread(new Runnable() {
@Override
public void run() {
 RealmResults<Cat> cats = realm.where(Cat.class).findAll(); // <--
// IllegalStateException: Realm access from incorrect thread.
// Managed objects, lists, results, and Realm instances can only be
// accessed on the thread where they were created.
}
}).start();

I’ve often heard the claim that “threading with Realm is hard”. I firmly believe this claim is false.

Even when there are multiple threads involved, using Realm is pretty easy — just get an instance of Realm for while you need it, then close it.

// on a background thread
try(Realm realm = Realm.getDefaultInstance()) {
   // open Realm for thread
   // use realm
} // realm is closed by try-with-resources

The one special case the developer has to remember is that instead of passing a managed object, you should generally pass its primary key, then re-query it on the background thread instead.

final String idToChange = myCat.getId();
// create an asynchronous transaction
// it will happen on background thread
realm.executeTransactionAsync(bgRealm -> {
// we need to find the Cat we want to modify
// from the background thread’s Realm
Cat cat = bgRealm.where(Cat.class)
         .equalTo(CatFields.ID, idToChange)
         .findFirst();
// do something with the cat
});

However, that’s actually as far as complexity goes, when it comes to Realm and threading, out of the box. RealmResults are already evaluated automatically by Realm on a background thread, and then the data is passed to the listener on the UI thread, so there is zero reason to ever try to pass them between threads.

Why people think threading with Realm is hard

So why is the claim so common? After spending a lot of time on the realm tag on Stack Overflow, I believe it comes down to the following primary causes:

1.) The developer has introduced RxJava into their project without understanding it, and end up introducing such level of complexity that they don’t understand what threads their code is running on.

2.) The developer doesn’t know how RealmResults works, specifically that findAllAsync() will evaluate the results on a background thread, and then the RealmChangeListener will receive it on the UI thread — and instead they attempt to handle the threading that Realm would already handle for them internally.

3.) Reluctance to pass instance of Realm as a method argument, and/or intending to use Realm’s thread-local instances as global singleton.

The first and second causes are self-explanatory. But what about the third?

When you “get an instance of Realm” with Realm.getDefaultInstance(), you actually receive a thread-local and reference counted instance, where the reference count is managed by calls to getDefaultInstance() and close(). It is not a singleton, and it cannot be accessed on different threads.

It’s a little known fact, but to avoid passing Realm and instead see it as a thread-local singleton, it’s possible to store it in a ThreadLocal. Then, once set on a given thread, Realm can be acquired from it as simply as Realm realm = threadLocal.get(). This is one way one could turn instances of Realm into globally accessible variables. In this case, what we need to manage is Realm’s reference counting.

Why do people need and want a singleton instance of Realm? To create the data layer implementation for their abstraction, of course!

public interface CatDao {
List<Cat> getCatsWithBrownFur();
 // ← initial attempt: uses List (no changes!) and has no arguments
// -------------
// alternatives?
RealmResults<Cat> getCatsWithBrownFur();
 // ← problem: Realm opened and closed could immediately
 // invalidate the results, Realm-specific

RealmResults<Cat> getCatsWithBrownFur(Realm realm);
// ← problem: We’re passing Realm as a dependency
 // as part of the contract, Realm-specific
}

So we’d like to create a contract that lets us hide Realm as an implementation detail, but takes it into consideration that data could be evaluated on a different thread, and is sent to us with some delay through a subscription (to a listener). And on top of that, any future writes to the database that invalidate our current dataset should trigger a retrieval of the new data, and notify us of change.

public interface CatDao {
  LiveObservableList<Cat> getCatsWithBrownFur();
  // it would be possible to write a wrapper
  // around RealmResults like this
  Observable<List<Cat>> getCatsWithBrownFur();
  // it is possible to convert listeners to Rx Observable
  // something else???
}

How Realm shaped the future of local data persistence on Android

The introduction of Jetpack, Room, and LiveData With time, other libraries emerged that supported the notion of “observable queries”. One of the first to follow Realm was SQLBrite, which exposed SQLite queries as Observable<List> — and used RxJava to do so.

Eventually, Google created their own approach, called the Android Architecture Components — which are now a part of Android Jetpack. The idea was to solve common problems that developers face when writing applications, and simplify them by providing a set of libraries and an opinionated guideline that helps solve these problems.

Observable queries with Room and LiveData The most notable addition to the Architecture Components is LiveData. It is a “holder” that can store a particular item, and can have multiple observers. Whenever the data stored within the LiveData is changed, then the observers are notified, and they receive the new data. When a new observer subscribes for changes of the LiveData, it receives the previously set latest data.

One of the biggest additions to the Architecture Components was Room, which is Google’s own ORM over SQLite. But what’s more interesting is that it allowed defining a DAO (for accessing the entity’s tables) like this:

@Dao
public interface CatDao {
  @Query(“SELECT * FROM CATS WHERE FUR = 'BROWN'”)
  LiveData<List<Cat>> getCatsWithBrownFur();
}

We can then use this in our Activity:

private LiveData<List<Cat>> cats;
private Observer<List<Cat>> observer = (cats) -> {
adapter.updateData(cats); // initial load + future changes
};
@Override
protected void onCreate(Bundle savedInstanceState) {
 super.onCreate(savedInstanceState);
 CatDao catDao = RoomDatabase.getInstance()
                    .catDao(); // get Dao
 cats = catDao.getCatsWithBrownFur(); // ← execute query
 cats.observe(this, observer);
// ← listen for initial evaluation + changes
}
@Override
protected void onDestroy() {
super.onDestroy();
 // no need to unsubscribe,
 // because of `observe(LifecycleOwner`
}

Doesn’t this look extremely familiar? Swap out Observer for RealmChangeListener, and LiveData<List> for RealmResults, and it should look almost just like the example above!

When we subscribe for changes of a LiveData (or more-so, “start observing it”), then as there is at least one active observer, Room begins to evaluate the query on a background thread, then passes the results to the UI thread. Afterwards, Room tracks the invalidation of the table(s) this query belongs to, and if that table is modified, the query is re-evaluated.

The key differences are that the diff calculation is moved to the adapter (see ListAdapter), and that LiveData’s lifecycle integration allows for automatic unsubscription of its observers, instead of having to do it explicitly.

Otherwise, the behavior is rather similar, in fact, so similar that LiveData<List> is a possible way to wrap RealmResults as merely an implementation detail — as it’s also shown in Realm’s examples.

The downside of LiveData<List>

In case of Room, there’s a downside to using LiveData<List> directly with large datasets. Each time a new version of the dataset is retrieved, the full dataset is copied to memory. So assuming we have 10000 cats, and modify a single cat, all other 9999 cats will be retrieved from database again as well in the newly evaluated list.

When relying on Realm’s lazy evaluation, this isn’t really a problem: we only ever retrieve items when we access them. We don’t load the full dataset to memory.

However, Google realized this poses a problem, and began working on a solution. The release of a new Architecture Component: Paging To eliminate the need for complete re-evaluation of a modified dataset each time a change occurs, Google invented something amazing: the LivePagedListProvider. Since then, the API has changed a bit, so it’s more appropriate to refer to it as the combination of DataSource.Factory and LivePagedListBuilder. With help of these classes, it’s possible to expose an observable query as a LiveData<PagedList>.

A PagedList is like a list, except it is backed by a DataSource, and it only loads pages of data at a time, instead of the whole dataset. If an item is accessed that’s not yet loaded, then that page is loaded from the datasource on a background thread. When the page of data is loaded, it’s passed to the UI thread, and can be “seamlessly diffed” into the currently loaded dataset. For this to work, Google also provides the PagedListAdapter.

The way Room’s Paging integration works is that we can expose a DataSource.Factory instead of a LiveData<List>.

@Dao
public interface CatDao {
 @Query(“SELECT * FROM CATS WHERE FUR = 'BROWN'”)
 DataSource.Factory<Integer, Cat> getCatsWithBrownFur();
}

Then we can do: private LiveData<PagedList> cats; private Observer<PagedList> observer = (cats) -> { pagedListAdapter.updateData(cats); // initial load + future changes };

@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
CatDao catDao = RoomDatabase.getInstance()
                    .catDao(); // get Dao
DataSource.Factory<Integer, Cat> dataSourceFactory = catDao.getCatsWithBrownFur();
cats = new LivePagedListBuilder<Cat>(dataSourceFactory, new PagedList.Config.Builder()   .setPageSize(20)   .setPrefetchDistance(20)   .setEnablePlaceholders(20)   .build())
.setInitialLoadKey(0)
.build();
cats.observe(this, observer);   // ← listen for initial evaluation + changes
}

This allows Room to expose an observable query, where the data is fetched asynchronously, but page by page, instead of loading the full dataset. This allows Room to be much faster than using regular SQLite, while retaining the benefit of observable queries.

Making Realm work with Paging

Realm already brings lots of benefits to the table, but many people intend to use Realm as an implementation detail, not directly. As such, they might want to keep observable queries (just like Room does), but don’t want to use managed results and managed objects: they always see the latest state of the database, which also means they mutate over time. They’re still proxies, and not regular objects.

One possibility would be to read the data from Realm using realm.copyFromRealm(results). However, this method could only be called on the current thread (in this scenario, generally the UI thread), and it would read the full dataset from disk. We’d have the exact same problem as with LiveData<List>, and would be reading large datasets at once on the UI thread! This is clearly a terrible idea, so what’s a better alternative?

We could move the copying of data from Realm to a background thread, but Realm cannot observe changes on regular threads, only on threads associated with a Looper. However, it’s possible to create a HandlerThread — if we can create a Realm instance on this handler thread, execute and observe queries on this handler thread, and keep these queries (and the Realm instance) alive while we’re observing them, then it can work!

Even then, we would still have the problem of loading the full dataset on the handler thread for each change. However, not if we can make Realm’s query results be exposed through the Paging library! Monarchy: global singleton Realm with LiveData and Paging integration I’ve been working on a way to expose RealmResults in an observable manner from a handler thread, either copying the dataset, mapping the objects to a different object, or through the Paging library.

The end result is called Monarchy, a library that lets you use Realm very similarly to as if you were using LiveData for a Room DAO. To create a global instance, all it needs is a RealmConfiguration, and then we can use it as a DAO implementation like this: public interface CatDao { DataSource.Factory<Integer, Cat> getCatsWithBrownFur(); } @Singleton public class CatDaoImpl implements CatDao { private final Monarchy monarchy; @Inject CatDaoImpl(Monarchy monarchy) { this.monarchy = monarchy; } @Override public DataSource.Factory<Integer, Cat> getCatsWithBrownFur() { return monarchy.createDataSourceFactory(realm ->
realm.where(Cat.class) .equalTo(CatFields.FUR, “BROWN”) ); } }

One strange design choice of the Paging library is that the fetch executor can only be set on the LivePagedListBuilder, and not on the DataSource.Factory. This means that to provide Monarchy’s executor that executes fetches on the handler thread, Monarchy must be used to create the final LiveData<PagedList>:

LiveData<PagedList<Cat>> cats = monarchy.findAllPagedWithChanges(
   dataSourceFactory, livePagedListBuilder);

On the other hand, it is still quite convenient: we receive unmanaged objects that are loaded asynchronously, page by page, and we’ve also retained the ability to receive future updates to our data when the database is written to.

Another interesting tidbit is that Monarchy uses LiveData, therefore depending on whether there are active observers, it can automatically manage whether the underlying Realm should be open or closed: completely moving lifecycle management of Realm instances into the onActive/onInactive callbacks of LiveData.

Conclusion

If it weren’t for my time using Realm, it would probably be much harder for me to understand the driving forces that shaped the Android Architecture Components: especially Room, and its LiveData integration.

The ability to listen for changes and always receive the latest state of data with minimal effort just by subscribing to an observable query is something that at first might seem foreign, but simplifies the task of “fetching data from the network, storing it locally, and also querying for and displaying data” — which is what most apps need to do.

Why would we manually manage data load callbacks, if we could just run background tasks that fetch data from network and write it directly into our database, while all we need to do for the UI is observe for changes — and have any new data passed to us automatically?

What if in “Clean Architecture”, fetching data isn’t even a “use-case”, but just an Effect of finding that our data is outdated and should be refreshed?

What is the point of introducing Redux, if it restricts us from making subscriptions to our database queries, as the database would become a second store, and therefore would force us to make data loading be a single fetch inside “middlewares”? (Unless subscriptions are seen as a “state” that should be built up as a side-effect in the store’s observer, of course).

What if the abstraction we’re trying to build binds our hands and keeps us from finding simpler solutions that solve our problem in a more efficient way?

Realm’s observable queries were ahead of its time, but shaped the future of Android local data persistence. Instead of manual invalidation of data, we could just observe for changes. What else is there that we take for granted, build on top of, and keeps us from finding a better solution?

Using Realm taught me that even though there’s a common way of doing things, sometimes taking a completely different approach yields much better results. I’m glad that I had the chance to try Realm, and could learn from the opportunity.

Note from Realm:

We liked Gabor’s article, thus we want to re-post it here with Gabor’s permission. Gabor Varadi regularly writes articles on Medium which can be found here.

While Monarchy is not an official Realm library, it is an interesting complimentary library developed by Gabor. You can find it and file issues on his repo (https://github.com/Zhuinden/realm-monarchy)


Gabor Varadi

Android dev, Zhuinden, or ‘EpicPandaForce’ @ SO. Tinkers with Realm, dislikes multiple Activities/Fragment backstack.

Get more development news like this