Social coding has revolutionized how we share code with others. Tools like GitHub, CocoaPods, and Carthage make publishing and consuming code so convenient that our dependencies have become smaller and more numerous. Nowadays, most projects quickly resemble a Jenga tower, with layer upon layer of poorly understood single points of failure. Despite our progress, Justin Searls urges us to pause and reflect on our relationship with open source. Convenience and ego drive most open source adoption, he says, but these shortsighted motivations raise long-term problems we will need to clearly identify if we ever hope to solve them.
Open Source is Good (0:00)
Not every new start-up invents everything on their own. Instead, they stand on this mountain of free stuff that they take, and then they just write a little, itty bitty bit of app code on top. Some individual can now work in their basement, write a little library that gets everywhere, and changes the world along with how other developers work; it’s exciting stuff.
Is Open Source Good, Really? (1:45)
Many companies happily consume Open Source, but if you want to even share a patch, much less like open source an entire library or application, the stingy legal department gets involved. Furthermore, many start-ups don’t really understand the stuff that they’re standing on top of. They build ticking time bombs of maintainability nightmares. In addition, most of the maintainers I know become burnt-out, frustrated by the fact that our entire world is expecting free customer support from them for their work that might have started as a hobby.
In this talk, our goal is to look at a handful of issues that are facing Open Source on a systemic basis in the hope that we all get in the habit of doing that more often. And maybe we can develop a collective sense of anguish about these things, and only then will people start having creative ideas for how to tackle those problems with new solutions. We might finally reach a collective state of relief in Open Source in general and be in a more healthy place. However, my goals in this talk are much more modest; I’m just going to talk about problems, as it’s a lot more fun to tear stuff down and complain.
Issues with Open Source (2:12)
This talk will discuss four primary issues:
- Dependencies: all of the dependencies that our applications stand on top of that lead to a feeling of a teetering Jenga tower.
- Open Source Maintenance - The People: the feeling maintainers get when working on these projects.
- Trust issues in Open Source
- Communication within the community of Open Source projects, as well as where I think things are going to go in the future.
The Real Life Analogy (3:11)
Ideology: “They do not know it, but they are doing it.”
- Karl Marx
The ideology is like the negative space that drives our action. It’s not something we actively sign up for. It’s apt for this talk because that came from Marx’s book, Capital, which sat at the intersection of Philosophy and Economics, similarly to Open Source.
People have this “share, share alike, let’s all work together attitude”, but meanwhile, lots of companies and people are making money hand over fist, and only because of Open Source technology. Perhaps Marx’s perception of economic progress in very broad strokes resembles our technological progress today. In the beginning, everything was awful. We all had to survive by the labor of our own hands. But then humans started forming groups and tribes, and specialization emerged, so you could go to one person in your village for vegetables, another one for meats, and through economies of scale, that opened up the door for recreation, art, and culture.
Industrialization may be viewed as a hyper-optimization of this. You can go to the supermarket, and get all the commodities you need in one shot, and have lots more time in the rest of your week for other activities. The internet has sort of blown the doors on this. Now from bed, on my iPad I can order everything I want, and then it’s going to be on my doorstep in two days. This is clearly progress, right?
However, we must ask, where does that progress lead? The last couple of years, people have been asking questions like rumors of Amazon shipping stuff to local distribution centers in anticipation of you ordering it, so that they can accomplish same day deliveries. That fundamentally changes the dynamic of commerce as being something you opt into. Instead, it is oddly pushed on us. That’s an example of an unintended consequence. Another unintended consequence of industrialization that’s interesting is exposed by all those food documentaries about the industrialization of our food, such as Food, Inc.. Its audacious headline claims “You’ll Never Look at Dinner the Same Way.” Although that documentary had a big impact on you, a more honest headline would be, “For like, at Least a Month”. It’s really unlikely that you’re just going to stop eating food. It’s a much bigger systemic issue. If you view progress as this linear line that just grows and increases, it’s easy to forget that as we march in one direction, awfulness accretes as if by side effect.
Thinking about Food Inc. eventually we all reach this point where we freak out collectively. You’d like to just start fixing everything right there at the moment that you freak out, but the system is too big. It can’t turn on a dime. “Nothing stops this train”, and instead, the best we can hope for is to bend the curve of awfulness after stuff gets even worse.
Similar Issues With Dependency Management in Open Source (5:46)
With dependencies, Open Source has a similar kind of negative progress over time. In the beginning, stuff was really rough. It was just source code out there, and we were exchanging it by pedestrian means. If I have like an application with some code, and I want somebody else’s code, I had to go download it, get it, put it in my code base, and then depend on it. And from that moment on, it lived as part of my application. It was my job to maintain and deal with it in the future. Build systems, common build systems like Make were really influential because now we could logically depend on dependencies, so we’re writing our own application, but if I want to link against Libxml2, for instance, it lives somewhere else in my system. It can be updated separately, and I can just recompile and go and it doesn’t necessarily mean that I own that forever.
Java and JAR files also became interesting because of the “write once, run anywhere, universal binary” type of concept, where I can write a bunch of Java code, and then go out to a website and find a couple of JAR files, and then drop them on my class path. That convenience led to smaller, more focused dependencies, and it also had an interesting side effect, where those dependencies and their websites would say, “Oh, and by the way, we also depend on this other JAR”. As a result, dependencies began having dependencies, and it’s my job as the developer to manage that stuff, which is really cool and really liberating. We call those dependencies of our dependencies transitive dependencies (an important term).
Nowadays, too, we have new cool tools, like Gemfiles for Bundler, you have Podfiles for CocoaPods. You have Cartfiles if you use Carthage as an Apple ecosystem developer. These tools allow us to have our code, and explicitly state in a file, “Here’s all the dependencies I want at this specific set of version ranges,” and it does all that hard constraint Boolean logic for us; it finds out what versions of its transitive dependencies it needs to resolve to, and their transitive dependencies. This results in a big stack, this inverted set of dependencies that we’re standing on top of.
Node.js and the NPM Package Manager have taken things a step further. Your code on top of the system has dependencies just like before, but a key difference is that the Node.js runtime allows you to load the same library at multiple different versions in a single process. Such behavior allows for the trivial implementation of a dependency tool to suck up recursively all of the dependencies of all of their dependencies all the way down. This leads to very deep dependency trees; in fact, for our poor friends using Windows as developers, it’s not uncommon at all for Node.js applications to hit the Windows max file path limit (commonly 255 characters). I get that issue opened about once a month on one of my projects.
Countdown to Sorrow: The Open Source Story (8:22)
All of this is short-term progress, and it’s available to us for the low, low price of long-term fragility. Louis C.K. jokes about how everything that makes you happy comes to an end - bringing a puppy home begins a “countdown to sorrow”. Welcome to Open Source, everybody - this is a state of the union of the non-Apple world.
“Build a small, non-trivial Rails app. An empty app has ~50 Gem dependencies; yours will have 75-100. Go away for six months. Come back, update all your dependencies. Your app no longer works.”
- @garybernhardt, not a Rubyist
It’s easy to start a new thing: for example, creating new Jekyll blog, installing SASS, making a new Rails app… It’s always easy right now, but no one’s thinking about a year from now. It’s all optimized for initial adoption and not long-term sustainability of the systems we’re building. Our apps don’t include just the code we write. They’re everything we’re shipping to our customers, everything that we’re running on our servers. It’s never been easier to build a new thing. You can download a new sample project from Apple and be up and and running and just tweak it, but the stuff that we’ve been building, when you look at the whole net of it, has never been more complex.
I’m guilty of this all the time. If I’m building a Rails app in Ruby and begin to think of its details, it might depend on this one specific gem at this specific version along with 50 others, and I don’t realize the implications of that. I don’t think about that, because it’s invisible to me. That one gem at that one version range means that in the Ruby ecosystem there’s 272 gems that cannot be installed at any version because they have a conflicting version resolution specifier for that specific gem.
In fact, our tooling could do a lot better of a job in Ruby and everywhere, saying, “Hey, by they way, you have 10 direct dependencies? Those sucked in 43 transitive dependencies”. That’s actually been implemented since I first did this talk. Your gem’s version specifiers, they preclude the installation of say, 1,300 out of the 48,000 gems in the world. “Bundle Update: if you were to try to update your dependencies, you would be literally unable to update five gems to the latest version because of conflicting specifiers”.
In Node.js land, another place where I spend a lot of my time, version resolution doesn’t affect NPM. Nevertheless it has other problems. For instance, if I have different dependencies that all depend upon different version of 1 dependency, it creates issues with handling my own model object. It’s just data, a non-versioned object. I might foolishly pass that into my dependency, which passes it back into a different version of that dependency, creating unknown behaviors. What now? Does that blow up? Does it work? Odds are the maintainers are not testing people using the same library at multiple version points in one process, but it happens frequently, creating tons of very mysterious hard-to-debug issues.
Another problematic occasion is when one of these dependencies updates with a breaking change that one dependency requires but others don’t. This occurred in our Lineman tool and prevented users from installing it since March 13th, 2013. To solve this, we had to fork the breaking dependency on GitHub and set the version to what it needed to be. Then we pushed it to NPM solely for the purpose of resolving this conflict. And now we own this other thing that we don’t understand, but for this one version specifier. However, if there’s any security problems, or anything else, that’s our problem now. Of course, as a responsible Open Source developer, I do nothing to solve this issue, unfamiliar with the details of the dependency itself.
On the top level we can all understand that the code we write is the code that we need to get our job done, and the stuff that we depend on is there for convenience to help us do our job. But the more and more you think about it, the stuff underneath that feels like complexity, and eventually, as you start thinking about really deep dependencies, it just feels risky. And how often do we talk about our transitive dependencies’ transitive dependencies? At that point it just starts to feel mysterious. If anyone in the room has ever advertised, “Hey, I’m a full stack developer,” there are other people in the room that will laugh at you, because there’s no such thing as a full stack developer. Nobody understands everything that they stand on top of.
Although Makefiles and my experiences in college working with C was really painful, I could still build all my work 30 years later. I contrast, I’m not confident at all that I will be able to run
NMP install on my current projects and have them work five years from now, much less 30.
Behind the Curtain: The Work and Mentality of Open Source Maintainers (13:56)
It’s critical to understand that in this community, whether you’re a maintainer or not, not all maintainers are rock stars - they’re just people. In fact, I like to think of maintainers as just extra-early adopters. For instance, a maintainer might go and search for a tool that they wish existed and find that it doesn’t exist, so they turn around, build it themselves and share it. Early adopters on the other hand, operate similarly to new indie-rock band fans. The first thing that they’ll do is search for it, find it, get excited, then turn around on Hacker News or Reddit and share it with everybody else. The maintainer, from this perspective, gets stars in their eyes, excited that people are using his stuff.
Other early adopters can also point out problems or send non-constructive complaints. However, because early adopters tend to be just as competent as maintainers (as opposed to late adopted), they often send pull requests to fix those issues.
This makes the maintainer’s happy again. They’re finding that their like emotional state is dependent on these randos on the internet. Remember, every library that exists is mostly there to scratch the itch of that one person. They create a problem solving tool for themselves that still has issues and is not prepared for mass consumption. The early adopters are great in that they send patches to round it out, make it more mass-consumable, but still minimal. This is maybe a good time to cut a 1.0 release. If you’re like me and you’re afraid of semantic versioning zealots, this might be your 0.84.0 release.
Maintainers not Sharing Control Early (15:24)
After all of this, it’s a great time for a maintainer to say, “Hey, let’s own this together” with some of the early contributors. I’m sure any early adopter would be excited and invested. I’d even settle for, “Hey, let’s make you a committer on the project, and you can help triage issues,” and I’m sure they’d be really happy. However, since those conversations don’t take place, they may as well say, “Hey, let’s never communicate again”. The early adopter responds, “Okay, bye forever,” and they’re on to the next thing, because they’re always after the new shiny bubble.
Why is it that maintainers just don’t share control early? Too often they mis-predict how much happiness being a maintainer is going to bring them. They built a product, and that made them happy. They got attention, and that made them happy. They cut a 1.0, and that was really cool. So they just draw a line and extrapolate, and they can’t wait until how happy they’re going to be at version 2.
Late Adopters Lash Back (16:05)
Don’t worry, because late adopters tend to cure them of this happiness. Remember, a maintainer being an extra-early adopter solves his own problem, so maybe they go a week or a month without any commits. It probably does what the maintainer needs it to do. Because they adopt early, they get excited and distracted by something new, and go away. It might be a long time between updates. Sometimes stuff just feels done as a maintainer. A late adopter then arrives at the scene and sees no recent commits, a lot of stars, and seemingly a stable, safe bet. Of course, it’s also free and Open Source, so they view all those attributes positively.
If you recall, initially we just solve the maintainer’s needs, and then eventually it gets well-rounded out by other early adopters. Newcomers then negotiate about what should become of the product and try to solve issues with it. Late adopters are much different than early adopters in that they tend to be more demanding and carry a sense of entitlement. It’s not uncommon at all for somebody who’s really happy with me, when they first adopt a project, say “Wait, what, this doesn’t do my enterprise thing at all, what the hell? And how could you miss something so obvious and important?” Because from their perspective it is.
This is the source of entitled GitHub issue threads named “missing obvious important feature”. You try to respond kindly but hardly ever get constructive replies, and suddenly the message to me as a maintainer is that I am bad and I should feel bad for not doing this free work for them. In a likely scenario, I ignore my family for my free weekend and hack on the project for that random person, only to get absolutely no response in return. I made the project that I have worse for someone who did not appreciate it and now I need to dance around that extra branch in the code forever as I maintain this thing.
This serves as a great recipe for falling out of love with your projects. If you want one of my repos you can find me on GitHub, and say, “I want that repo”, and then you can have it because I don’t care anymore.
Maintainers Must Handle Trolls (18:40)
Trolls are an interesting category that tend to show up particularly against people with a large following. And that’s probably okay in itself, but becomes problematic once entangled with Twitter’s asymmetric nature of communication. The project maintainers might see trolls as the vast majority of the communication they receive, which creates a huge sense of negativity. This often leads the maintainer to cry for help to no avail, preceding the death of his project. Many of the dependencies that you use today are now unmaintained for reasons like this.
Open Source projects tend to survive more frequently when communities grow alongside them. From a company’s perspective, they could see contributing back to Open Source as a great risk-mitigation strategy; a way to help steer the project to align with your interests, as well as to catch security and bug issues before they become big, public exploits.
Project Adoption and Trust Building Through Marketing (19:50)
Open Source requires us to adopt it, and adoption requires trust. There’s the explicit trust of stuff that we depend on, and there’s also the implicit trust of everything that our dependencies depend on. The tool frequently employed to build trust in a project is marketing.
Linus Torvald’s announcement of Linux in 1991 likely would not make Hacker News today; its marketing was awful back in the day - a small, self-deprecating post to the Minix mailing list without even a catchy phrase. Nevertheless, Linux runs all the servers in the world.
10 years later, Ant comes along: a build tool for Java, trying to convince businesses to use it. It had a logo, it had a bunch of website stuff, even a mission statement and foundational affiliation to confer trust. Nowadays, we have so many more dependencies, we don’t even have time to read your mission statement; we expect to see a Markdown README with a quick intro, easy-to-follow steps, and mostly green badges along the top. And then we can consume your dependency. Furthermore, many of the big companies such as Facebook, Google or Twitter invest a lot in marketing budget.
For example, this is a project called Yeoman. Yeoman is a project where a group of diverse engineers are building a rocket ship, but it has gradients, an authoritative tagline, a cool one-liner, and a wizard that walks you through. It can generate a thousand different types of projects and is popular, but should not be used without question. It’s important for us to recognize when we’re being marketed to. I don’t claim that marketing’s good or bad, it’s just that if you do not realize that you’re being marketed to, it’s very easy to be led astray. Hacker News also contributes to this issue of bad commentary on useful or non-useful projects and tools. Besides, nobody takes time to vet their transitive dependencies.
Online Communication and Liability Behind a Screen (21:55)
The stick figures I used in my slides were kind of a lie. Open Source is not an occurrence between humans in a room talking and being humans with each other. Open Source is something that happens over email, GitHub and Twitter - asynchronous text. We become no more than an avatar, a username, and text on a screen.
“It’s fucked up that a lot of modern discourse is optimized for whoever has the fewest feelings and the most free time.”
That truly resonated with me, and I’m really bummed out that that did. So I mean if there’s uncertainty or ambiguity between two people, disagreement, or if it grows to simmering disdain, the answer is probably not an email to solve that issue. To solve this, you could try to bump up the communication fidelity: maybe a real-time chat will cover more angles, maybe a phone call will help you start to empathize. If you make somebody cry you can see it on a video chat. Lastly, of course, there’s something magical about when two humans get in a room if you actually make the effort, where even if we vastly disagree, we’ll find some way to compromise.
This strategy of increasing the fidelity of our communication is also a great troll repellent, as people who feed on their perceived lack of consequences and anonymity. So if somebody’s being a bad actor you can be like, “All right, cool, let’s have a video chat,” and then they go away.
Proposed Improvements to Open Source Communication (23:15)
I’d really like to see our tooling improve to foster better, simple synchronous communication. In our collaboration tools, in addition to just commenting, I want to be able to start a chat, dynamically, to build a live example in place - show you what I’m working on, reproduce my issue. I’d like to schedule a pairing session with tools like Screenhero, demonstrating my issue in 30 seconds as opposed to writing a stressful, four-month-long issue.
The Cautious Dependent (23:48)
Things will get worse with Open Source until the fever breaks and things get better.
Initially I reacted negatively, but in the grand scheme of things, innovation on the lower level stopped taking place for a while until recently. Go, Rust, Swift, and other new programming languages demand new ecosystems, which demand new tooling.
The issue arises due to when realtime, embedded systems fail, much graver consequences result.
For instance, when healthcare.gov crashed, not much happened other than 60 news cycles about how Obama was destroying America; people just had to wait a little longer to sign up for a plan that they had actually four more months to do. In contrast, if a Da Vinci Machine breaks on the table during open-heart surgery, the consequences are huge by orders of magnitude; embedded machines have a much lower fault tolerance.
Whenever I talk to my systems engineering friends, they carry that sense of responsibility with them every day at work. They talk about adopting a dependency as if they’re outsourcing their understanding of how to do something. They perceive their application as layering over with that dependency, understanding only what job that dependency is doing for them, but knowing that they don’t know how that dependency operates. And we often, what I like to do is I call that understanding debt. As a web developer I often disregard this fact as I used to assume understanding debt is just naturally paid down by iterating: as I iterate I learn about that dependency, and if I run into issues with it I can easily yank it out. But if iterative releases aren’t practical, like in iOS when we ship apps out to customers, and can’t update automatically, pushing stuff down all the time, we shouldn’t outsource nearly as much understanding.
Hosted Systems vs. Embedded Systems: Direct Comparison (25:59)
If you compare a hosted system versus an embedded one, hosted systems usually have short lifespans, embedded ones might be rated for a long time. Hosted systems have engineering teams for the life of the project, embedded systems might only have an RMA process. Hosted systems normally have tons of overhead for inefficient dependencies with CPU and RAM, embedded systems might have to calculate exactly how much they’re going to use. Hosted systems are connected, so it’s easy to update them, but embedded systems might have little or no connectivity. These make it easier to iterate on hosted systems, and difficult to iterate on embedded systems. Therefore, it’s important for us to have a deeper, upfront understanding of what we’re building in that particular domain.
When you think about the depth of understanding you need building an application, for a web app, it’s not very much. You need to know how browsers work, how HTTP works. In contrast, when making a piece of landing gear, you need to know a lot about all the different components on the plane, because your risk tolerance substantially lower. If I just start with a traditional web app where I’ve got all my models and a mountain of Open Source underneath me that I don’t understand, I can’t just cargo cult that because it’s not deep enough. There’s this whole danger zone there of problems. A lot of my friends in systems engineering, they either write more code, and use fewer dependencies, or they go through a process of qualifying those tools and rigorously proving that they do their job, and how they do it, and under what constraints.
Summary and Closing Remarks (27:12)
Most Open Source tools are a product of web development. My hope is that we become more cautious and create more system-level, less dependent code, and build this cautiousness and understanding back up. Then we could form a broader perspective about Open Source tooling.
Open Source is good.
However, companies should seek for bugs, strengthen the security of their products, and start-ups should be taking time to understand what they’re building, or else understanding that that means they might have to re-write everything in a couple of years. I like to call that the “Slow Code Movement”. Maintainers should aggressively pull in other collaborators and help others offer them help at an early stage.