Detect All Memory Leaks With LeakCanary

Droidcon ricau cover?fm=jpg&fl=progressive&q=75&w=300

Detect All Memory Leaks With LeakCanary

by Pierre-Yves Ricau

Oct 19 2015

We’ve all been bitten by memory leaks that cause OutOfMemoryError crashes in our apps at some point, sometimes even in production. Square’s Pierre-Yves Ricau solved this problem by building LeakCanary, a tool to detect and help you fix memory leaks before they ship. In his Droidcon 2015 NYC talk, Pierre teaches techniques to dramatically reduce OutOfMemoryError crashes and easily fix reference leaks, making your app more stable and usable.

Save the date for Droidcon SF in March — a conference with best-in-class presentations from leaders in all parts of the Android ecosystem.

Introduction (0:00)

Hi, I’m Pierre-Yves Ricau (PY for short) and I work at Square.

One of our products at Square is the Square Register, an app we built to take payments on your mobile device. During transactions in this app, the customer signs the screen with their finger.

Unfortunately, this signature screen crashes sometimes due to an “out of memory” error. Honestly, it’s a really bad time to crash – the customer and the merchant have no idea if the payment was completed, and this is really bad when you’re dealing with money. We realized we really needed to take care of those “out of memory” errors, or memory leaks.

Memory Leaks: Non-Technical Explanation (1:40)

I want to talk about a solution to memory leaks: LeakCanary. LeakCanary is an open source library that can help put a stop to memory leaks. But what exactly is a memory leak? Let’s start with a non-technical, illustrative example.

Imagine that my open hand represents the amount of memory available to our app. My hand can hold a lot of things – keys, an Android collectible, whatever. Imagine that my Android collectible requires Square Readers to function. Also, for the sake of analogy, let’s say that the Square Readers must be attached to the Android collectible with some thin threads, where the threads represent references.

My hand can only hold so much weight. The Square Readers attached to the Android collectible add to the total weight, just like references take up memory. Once I’m finished with the Android collectible, I can just throw it on the floor and a garbage collector (my Roomba, perhaps?) comes by to collect it. Somehow, everything gets into the trash, and my hand is light again.

Unfortunately, sometimes something bad happens. My keys may be attached to my Android collectible in a way that prevents it from being thrown on the floor. As a result, the Android collectible never gets garbage collected. That is a memory leak.

There are extra references (keys, Square Readers) pointing to an object (Android collectible) that should no longer be pointing to that object. Small memory leaks like these accumulate and become a big problem. These tiny threads add to the weight of my hand, until they become too heavy, and my hand can no longer hold on.

LeakCanary to the Rescue (3:47)

That’s where LeakCanary comes in.

I know, for instance, that these Android collectibles need to be destroyed and garbage collected soon, but I’m unable to see if they have been garbage collected or not. With LeakCanary, we attach smart pins to things like my Android collectible. The smart pins know whether they have been garbage collected or not, and after a while, if the smart pin realizes that it is still not in the trash, it takes a picture of the memory – a heap dump into the file system.

LeakCanary then publishes the result so that we can see the memory leak. We can view the chain of references that are holding the object in memory and preventing it from being garbage collected.

For a concrete example, take our signature screen on the Square Reader app. The customer goes to sign, but the app crashes due to an out of memory error. We cannot be sure of the cause of the memory error right away.

The screen is a giant bitmap that holds the customer’s signature. The bitmap is the size of the screen – it could be causing the memory leak, so we try to troubleshoot. First, we may switch the bitmap’s configuration to alpha-eight to save memory. That is a common fix and works okay, but it doesn’t solve the problem; it only lessens the total amount of memory leaked. The memory leak is still there.

Get more development news like this

The actual problem is that our heap is almost full. There should be enough space for our big signature bitmap, but there isn’t because all these little leaks added up and took all the memory.

Memory Leaks: Technical Explanation (8:06)

Imagine I have an app that lets you buy a baguette in one click. (I think only French people would need that app, but hey, I’m French!)

private static Button buyNowButton;

For whatever reason, I put my button in a static field. The problem with this is it’s never going to be garbage collected until I set that field to null.

You could say, “It’s just a small button, who cares?” The problem is that my button has a field called mContext, which is a reference to the activity, which in turn has a reference to the window, which has the entire view hierarchy, including a huge picture of a baguette in memory. In all, that’s a couple of megabytes that are being held by a static button even after the activity has been destroyed.

The static field is called a GC route. The garbage collector tries to collect everything that is not a GC route or something held through references by a GC route. So, if you create an object and remove all references to it, it will to be garbage collected, but if you put it in a GC route like a static field, it won’t get garbage collected.

When you look at something like this baguette button, it seems obvious that the button has a reference to the activity, so we need to clear it. When you’re in your code, though, you don’t know that; you only see references forward. You can know that the activity has a reference to the window, but who has a reference to the activity?

You can use stuff like IntelliJ to do smart things, but it won’t tell you everything. Basically, you can think of the objects and their references as a graph, but it’s a directed graph: it only flows in one direction.

Analyzing the Heap (10:16)

What do we do about all of this? We take a photograph. We take all that memory and dump it to a file, which we then open with a tool to analyze and parse the the heap dumps. One of these tools is Memory Analyzer, also known as MAT. It basically has all the live objects and classes that were in memory when you took the heap dump. It has a query language called SQL, so you can write something like:

SELECT * FROM INSTANCEOF android.app.Activity a WHERE a.mDestroyed = true

This will give you all the instances where destroyed is true. Once you have found the leaking activities, you can do something that’s called merge_shortest_paths, which computes the shortest paths to the GC routes. This will find the backward paths to the objects that are preventing your activity from being garbage collected.

I mention “shortest path” specifically because there are multiple paths from a number of GC routes to any activity or object. For example, my button’s parent view, which also holds a reference to the mContext field.

When we look at a memory leak, we don’t need to look at all the paths; we just want the shortest one. That way, there is less noise, making it easier to find the problem.

LeakCanary Saves the Day (12:04)

It’s great that we have a tool to find leaks, but in the context of a live app, we can’t very well as our users to find the leaks for us. We can’t ask them to do AGB, make all these comments, and then send us back a 70MB file. We could do it in the background (some people do), but that’s not cool. Instead, we want to focus on how we can detect the leak earlier, as it happens, while we’re developing the app. That’s where LeakCanary comes in.

An activity has a life cycle: you know when it’s created, you know when it’s destroyed, and you expect that after onDestroy() is called, it’s going to be garbage collected soon. If you had a way to detect if it has actually been garbage collected, then you have a trigger that can shout, “Hey! This thing might be leaking! It hasn’t been garbage collected yet!”

Activity is also nice because it’s used everywhere. Everybody uses it because it’s a god object that has access to a bunch of services and the filesystem. There is a good chance that if an object is leaking, the object has a reference to a context, and therefore leaks the context.

public class MyActivity extends Activity {

  @Override protected void onDestroy() {
    super.onDestroy();
    // instance should be GCed soon.
  }
}

Resources resources = context.getResources();
LayoutInflater inflater = LayoutInflater.from(context);
File filesDir = context.getFilesDir();
InputMethodManager inputMethodManager =
  (InputMethodManager) context.getSystemService(Context.INPUT_METHOD_SERVICE);

LeakCanary API Walkthrough (13:32)

Coming back to our smart pin, we want to know when the life cycle ends and track what happens from then on. Fortunately, LeakCanary has a simple API.

Step One: build a RefWatcher. This is an object to which you’re going to pass instances and it will check if they’re being garbage collected. This works with any object, not just activities.

public class ExampleApplication extends Application {
  
  public static RefWatcher getRefWatcher(Context context) {
    ExampleApplication application = (exampleApplication) context.getApplicationContest();
    return application.refWatcher;
  }

  private RefWatcher refWatcher;

  @Override public void onCreate () {
    super.onCreate();
    // Using LeakCanary
    refWatcher = LeakCanary.install(this);
  }
}

Step Two: listen to the activity life cycle. Then, when onDestroy() is called, we pass the activity and it extends to this refWatcher.

public ActivityRefWatcher(Application application, final RefWatcher refWatcher) {
  this.application = checkNotNull(application, "application");
  checkNotNull(refWatcher, "androidLeakWatcher");
  lifecycleCallbacks = new ActivityLifecycleCallbacksAdapter() {
    @Override public void onActivityDestroyed(Activity activity) {
      refWatcher.watch(activity);
    }
  };
}

public void watchActivities() {
  // Make sure you don’t get installed twice.
  stopWatchingActivities();
  application.registerActivityLifecycleCallbacks(lifecycleCallbacks);
}

What are Weak References (14:17)

To explain how this all works, I have to talk about weak references. I mentioned the static field that holds a reference to our baguette activity; the “Buy Now” button has an mContext field that’s keeping the reference to the activity. That is called a strong reference. In garbage collection, you can have as many strong references to an object as you want, and when that number gets to zero (i.e. no one is claiming a reference), then the garbage collector can claim it.

A weak reference is a way to access that object without increasing the number of references to it. If there are no more strong references to that object, the weak reference will also be cleared. So, if we make our activity a weak reference, and if at some point we realize the weak reference is cleared, it means that the activity has been garbage collected. However, if it isn’t cleared, we likely have a leak that merits investigation.

private static Button buyNowButton;

Context mContext;

WeakReference<T>

/** Treated specially by GC. */
T referent;

public class Baguette Activity
  extends Activity {

  @Override protected void onCreate(Bundle state) {
    super.onCreate(state);
    setContentView(R.layout.activity_main);
  }
}

The main purpose for weak references is for caches, where they are very useful. They are a way to tell the GC to keep something in memory, but that it can clean them if no one else is using them.

In our example, we extend weak reference:

final class KeyedWeakReference extends WeakReference<Object> {
  public final String key; // (1) Unique identifier
  public final String name;

  KeyedWeakReference(Object referent, String key, String name, ReferenceQueue<Object> referenceQueue) {
    super(checkNotNull(referent, "referent"), checkNotNull(referenceQueue, "referenceQueue"));
    this.key = checkNotNull(key, "key");
    this.name = checkNotNull(name, "name");
  }
}

You can see that we add a key to our weak reference (1). That key is going to be a unique string. The idea is that when we parse the heap dump, we can ask for all instances of KeyedWeakReference, then find the one with the corresponding key.

First, we create weakReference, and then we write in “later, I need to check the weak reference” (although “later” is actually just a few seconds). This is what happens when we call watch.

public void watch(Object watchedReference, String referenceName) {
  checkNotNull(watchedReference, "watchedReference");
  checkNotNull(referenceName, "referenceName");
  if (debuggerControl.isDebuggerAttached()) {
    return;
  }
  final long watchStartNanoTime = System.nanoTime();
  String key = UUID.randomUUID().toString();
  retainedKeys.add(key);
  final KeyedWeakReference reference =
    new KeyedWeakReference(watchedReference, key, referenceName, queue);

  watchExecutor.execute(() → { ensureGone(reference, watchStartNanoTime); });
}

Under the hood, we are doing System.CG, which — disclaimer — no one should do. However, it’s kind of a way to say, “Hey garbage collector, now is a good time to clear all the references,” and then we check again. If it’s still not cleared, we might have a problem, so we trigger the heap dump.

HAHA! (16:55)

It’s amazing what we can do with a heap dump! When I was originally dealing with these, it took a lot of time and effort. I was doing the same thing every time: download the heap dump file, open it in memory analyzer, find the instance, and compute the shortest pass. But I’m lazy, and I didn’t want to do that every time. (We’re all lazy, right? We’re developers!)

I could have written an Eclipse plugin for memory analyzer, but no one should ever have to write Eclipse plugins – they’re terrible! Fortunately, I had an “aha!” moment. I could take an Eclipse plugin, remove the UI, and use the rest of the code.

HAHA, or Headless Android Heap Analyzer (I’m pretty proud of the name), is basically a repackaging of code written by other people to do just that. It started as a fork of memory analyzer with the UI removed, then, two years later, it was reforked by someone else who added Android support, when it was finally discovered another two years later by me. I repackaged it to put it in Maven Central.

I recently changed the implementation to be based on the new code in Android Studio (Perflib). The code from that is actually decent, and it’s going to be maintained.

LeakCanary Implementation (19:19)

We have our library to parse heap dumps, and implementing it is fortunately very easy. We open the heap dump, load it, parse it, and then find our reference based on the key that we had. We obtain that instance, and then we just need to parse the graph of objects, work backwards, and find the leaking reference.

All that work is actually happening on the Android device. When LeakCanary detects that an activity has been destroyed but not yet garbage collected, it forces a heap dump that gets put on the file system. It then starts a service in a separate process, which will analyze the heap dump and publish the results. If you were in the same process, you might run out of memory trying to analyze the heap dump. It’s kind of a weird recursive situation.

At the end, you get a notification which can be clicked to display a detailed chain of references for your leaks. It also displays size of the memory leak, so you can know how much memory you could reclaim if you fixed the leak.

The API is also extensible, so you can have hooks and callbacks, meaning you could upload all that information to a server. For Square, we just used the Slack API to upload it to a Slack channel since this is only used during development and testing.

@Override protected void onLeakDetected(HeapDump heapDump, AnalysisResult result) {
  String name = classSimpleName(result.className);
  String title = name + " has leaked";
  slackUploader.uploadHeapDumpBlocking(heapDump.headDumpFile, title, result.leakTrace.toString(),
    MEMORY_LEAK_CHANNEL);
}

Using the API is really easy, as you can see here. I used Retrofit to create an interface and add a bunch of annotations. In our Slack channel, we get all that information about the memory leak.

At Square, this process has enabled us to lower the crash rate by 94%! It works really well.

Debugging A Real World Example (22:12)

Here is an example of a memory leak we found in AOSB, the Android source code. Suppose we have an app with an undo bar. We have an activity, and at some point you click on a button and you want to remove the undo bar.

public class MyActivity extends Activity {
  
  @Override protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstatnceState);
    setContentView(R.layout.activity_main);

    find ViewbyID(R.id.button).setOnClidkListener(new View.OnClickListener() {
      @Override public void onClick(View v) {
        removeUndoBar();
      }
    });
  }

  private void removeUndoBar() {...}

  private void checkUndoBarGCed(ViewGroup undoBar) {...}
}

We get all the views, and, just for fun, we set a layout transition (1). We also add a view to the undo bar, mostly because we can (2). Then, we remove the undo bar from its parent layouts (3).

public class MyActivity extends Activity {
  
  @Override protected void onCreate(Bundle savedInstanceState) {...}

  private void removeUndoBar() {
    ViewGroup rootLayout = (ViewGroup) findViewById(R.id.root);
    ViewGroup undoBar = (ViewGroup) findViewById(R.id.undo_bar);

    undoBar.setLayoutTransition(new LayoutTransition()); // (1)

    // (2)
    View someView = new View(this);
    undoBar.addView(someView);

    rootLayout.removeView(undoBar); // (3)

    checkUndoBarGCed(undoBar);
  }

  private void checkUndoBarGCed(ViewGroup undoBar) {...}
}

At that point, nothing is referencing the undo bar anymore, so it should be garbage collected soon. If not, then we have a problem.

We used the LeakCanary API to say, “Hey, this undo bar should be garbage collected soon, please check that in a few seconds (1).” So we do that, we run the app:

public class MyActivity extends Activity {
  
  @Override protected void onCreate(Bundle savedInstanceState) {...}

  private void removeUndoBar() {...}

  private void checkUndoBarGCed(ViewGroup undoBar) {
    RefWatcher watcher = MyApplication.from(this).getWatcher();
    watcher.watch(undoBar); // (1)
  }
}

We remove the undo bar, and… uh oh, LeakCanary isn’t happy. It found a memory leak, and reports back to us:

static InputMethodManager.sInstance

references
InputMethodManager.mCurRootView

references
PhoneWindow$DecorView.mAttachInfo

references View$AttachInfo.mTreeObserver

references
ViewTreeObserver.mOnPreDrawListeners

references
ViewTreeObserver$CopyOnWriteArray.mData

references LayoutTransition$1.val$parent

leaks FrameLayout instance

You can see that InputMethodManager has a static reference to the current root view, mCurRootView, which is currently active, holding everything. However, the root view has a thing called TreeObserver, which is an object to which you can add listeners so you can react to changes in the view hierarchy, unique per view hierarchy. So, the TreeObserver has a bunch of PreDrawListeners. This means that you add a PreDrawListener, and then will be called back right before drawing the view hierarchy.

That seems fine so far, but you can see the ViewTreeObserver has a bunch of PreDrawListeners. One of those PreDrawListeners is a LayoutTransition$1, which is the Java way to write the names of anonymous classes. This means this is the first defineAnonymous class in the LayoutTransition class. Then you see that is has a field called val$parent that references our leaking undo bar. val$parent is basically a final local variable outside of the anonymous class that’s called parent. That’s how the compiler changes the syntactic sugar into something that’s compatible with Java 1.

Let’s look at what’s going on:

android.animation.LayoutTransition#runChangeTransition

// This is the cleanup step. When we get this rendering event, we know that all of
// the appropriate animations have been set up and run. Now we can clear out the
// layout listeners.
observer.addOnPreDrawListener(new ViewTreeObserver.OnPreDrawListener() {
  public boolean onPreDraw () {
    parent.getViewTreeObserver().removeOnPreDrawListener(this);
    // ... More code
    return true;
  }
});

This is Android code - LayoutTransition is an Android class. Observer here, is a ViewTreeObserver. Everything looks pretty good; it’s registering the ViewTreeObserver, and then immediately unregisters itself on the first callback. This shouldn’t be leaking, so what’s going on? Let’s look at getViewTreeObserver.

public ViewTreeObserver getViewTreeObserver() {
  if (mAttachInfo != null) {
    return mAttachInfo.mTreeObserver; // (1)
  }
  if (mFloatingTreeObserver == null) {
    mFloatingTreeObserver = new ViewTreeObserver();
  }
  return mFloatingTreeObserver; // (2)
}

undoBar.setLayoutTransition(new LayoutTransition()); // (3)

View someView = new View(this);
undoBar.addView(someView); // (4)

rootLayout.removeView(undoBar);

getViewTreeObvserver returns the view hierarchy, ViewTreeObserver, if attached (1). If the view is not attached, it returns a temporary ViewTreeObvserver that will, if and when the view reattaches, merge everything in there into the live ViewTreeObserver (2).

The problem with this is that if my view’s detached, I’m going to get a fake ViewTreeObserver, not a live ViewTreeObserver. If you’ll notice, (3) we set the LayoutTransition, and then we add a view (4). This triggers the addOnPreDrawListener in the layout transition. Now, we have PreDrawListener registered on our ViewTreeObserver. However, (5) we then remove the undo bar, meaning that the undo bar is detached and no longer has access to the ViewTreeObserve.

final ViewTreeObserver observer = parent.getViewTreeObserver(); // used for later cleanup
if (!observer.isAlive()) {
  // If the observer’s not in a good state, skip the transition
  return;

}

public void onAnimationEnd(Animator animator) {
...
// layout listeners.

observer.addOnPreDrawListener(new ViewTreeObserver.OnPreDrawListener() {
  public boolean onPreDraw() {
    observer.removeOnPreDrawListener(this);
    parent.getViewTreeObserver().removeOnPreDrawListener(this);

If you look at the current state here, it’s asking the view, which in this case is detached for the ViewTreeObserver. It gets a fake ViewTreeObserver, and it can’t remove itself from it because it’s not in that fake ViewTreeObserver, it’s in the real one.

This change was made four years ago to prevent a different bug, but it’s caused an inescapable memory leak. It has since been fixed in Android, but we don’t know when it’s going to be available. I thought really hard about how to get around this memory leak, but there is no way around it. It’s just there.

Ignoring Android SDK Crashes (28:10)

In general, there are some memory leaks we cannot fix. For our purposes, we don’t want to see those. In LeakCanary, we built in a way to ignore some of the leaks so we can focus instead on our own problems.

LAYOUT_TRANSITION(SDK_INT >= ICE_CREAM_SANDWICH && SDK_INT <= LOLLIPOP_MR1) {
  @Override void add (ExcludedRefs.Builder excluded) {
    // LayoutTransition leaks parent ViewGroup through ViewTreeObserver.OnPreDrawListener
    // When triggered, this leak stays until the window is destroyed.
    // Tracked here: https://code.google.com/p/android/issues/detail?id=171830
    excluded.instanceField("android.animation.LayoutTransition$1", "val$parent");
  }
}

I want to reiterate that this is a dev tool only - you don’t want to ship this in production. It shows this big picture of a canary, and then it shows a notification. Your users don’t want to see that. What LeakCanary is great at is detecting memory leaks early.

We do still get “out of memory” errors. Even with this, I can tell you that the number of leaks we have is still greater than zero. Is there something else we could do to change this?

The Future of LeakCanary (29:14)

What if, instead of doing all of that leak-hunting during development, we were to release an app first? Then, when there is a crash, we could look at it and analyze it using the same process: take heap dump and start analyzing it in a separate process. At that point, we’re not looking for one single leak anymore. We’re actually looking for clues about what’s going on.

public class OomExceptionHandler implements Thread.UncaughtExceptionHandler {
  private final Thread.UncaughtExceptionHandler defaultHandler;
  private final Context context;

  public OomExceptionHandler(Thread.UncaughtExceptionHandler defaultHandler, Context context) {...}

  @Override public void UncaughtException(Thread thread, Throwable ex) {
    if (containsOom(ex)) {
      File heapDumpFile = new File(context.getFilesDir(), "out-of-memory.hprof");
      try {
        Debug.dumpHprofData(heapDumpFile.getAbsolutePath());
      } catch (Throwable ignored) {
      }
    }
    defaultHandler.uncaughtException(thread, ex);
  }

  private boolean containsOom(Throwable ex) {...}
}

So this is a Thread.UncaughtExceptionHandler. You can set that, delegate to the default one, which is going to crash the app. But before that, you can take a heap dump, and then start another process.

With this, we can do things like list all the activities that are destroyed and then find why they are still in memory, then list all the views that are detached. Because we know the size of everything that’s kept in memory, we could use this to prioritize and say, “Hey, this one’s more important than that one.” You could imagine a tool like Crashlytics having an extension for this kind of thing.

I actually have a prototype of that very thing, which I wrote while I was on a plane recently. It’s not released yet, and honestly, it’s not really reusable. One of the biggest problems is that if you’re encountering an out of memory error, you don’t have enough memory to parse through a heap dump and find where leaks are… To fix this, we would need to change the way we do things, for instance taking a streaming approach instead of just loading everything in memory.

Q&A (31:50)

Q: Any idea whether or LeakCanary works with Kotlin-based apps?

PY: I don’t know, but I don’t see why not. In the end, it’s all byte code, and Kotlin has references too. You should be able to make it work with LeakCanary.

Q: Do you have LeakCanary activated all the time on debug builds, or rather do you enable it for some builds just to test it?

PY: Different people take different approaches. What we do is enable it all the time. The problem with taking a heap dump is that it freezes the VM. When you’re trying to QA the app, and there’s a leak in the middle, you want to know about leaks, sometimes you want to be left alone. It’s very easy to disable it, and in fact we do that in some builds. Generally though, as soon as you start providing the ability to disable it, people will disable it. Soon, nobody will look at it anymore, and then it becomes useless. I’d say it’s a balance. We try to do it as much as we can, and fix the important leaks.

About the content

This talk was delivered live in August 2015 at Droidcon NYC. The video was transcribed by Realm and is published here with the permission of the conference organizers.

Pierre-Yves Ricau

Pierre-Yves Ricau is an Android Baker at Square, working on the Square Register. He formerly worked at Siine in Barcelona. He also enjoys good wine and low entropy code.

Twitter

4 design patterns for a RESTless mobile integration »