The Two Models

One of the first things that hit me after my “aha” moment around AI and software development was How late am I? After all, by June 2022, more than a million people were already using GitHub’s Copilot (and I assume multiple millions of people are as I type this). Clearly, I was late to my realization. Hopelessly behind, and bound to race to catch up.

And then I read a post by Andrej Karpathy.

On November 11, 2017, Andrej Karpathy published a post on Medium entitled “Software 2.0.” In it he argues that neural networks are a fundamental shift in how software is developed. Software 2.0, he argues, is about assembling the data set so as to define the “desirable behavior” (or problem to be solved), and then setting the “skeleton code” of the neural net’s architecture, before setting it loose to optimize for the solution. Accordingly, “software development” is about “curating, growing, massaging, and cleaning” datasets. And developers are “data labelers.” AI, says Karpathy, “is eating software.”

Now, I’m quite sure that Andrej has forgotten more about AI and neural networks than I’ll ever know, but two things struck me as I read his post:

  1. He wrote this in 2017! That’s like (*Eric does math*) SIX YEARS AGO. Joy swept over me, as I realized that it’s impossible for me to be late in my “aha” moment, if Andrej was writing this that long ago. Hell, I remember when Netscape Navigator came out, and in a lot of ways, everyone suddenly felt late then. Guess what we weren’t? Late.

  2. While I don’t doubt that he’s right about things like computer vision and problem sets that involve insanely large data sets, I was quite skeptical that developers would (only) become data labelers that spend their days and nights shepherding neural networks.

Obviously, and in all fairness to Andrej, I’m being a bit hyperbolic. In fact, Andrej goes on to essentially detail my skepticism later in his piece, when he writes that the choice between how we currently develop software and his vision of how software 2.0 will work is the choice between “using a 90% accurate model we understand, or 99% accurate model we don’t.” And therein lies the rub.

If the future of software is neural networks building software, then the future of software is a future where we no longer truly understand how the model works (we only know that it does).

Now, there will absolutely be problem spaces where we are fine with models that are more accurate (and better problem solvers) that we don’t understand, but surely there will also be plenty of problem spaces where we really need to understand how the model works. Insert all of your dystopian fantasies here.

What is clear is that nearly six years after Andrej wrote his post, we are now living in this somewhat bifurcated world of two software models, even if some people or organizations don’t yet know it. What’s also clear is that in the “model understood” mode of more traditional software development, the processes and tools are rapidly becoming massively AI-assisted (see everything from Copilot to Sourcegraph to Tabnine to Scale to Grit).

The bridging of the two models seems like a wildly interesting problem to solve as we move forward, and one that I haven’t found a lot of people talking about (if you have, please send that my way). However, it would seem that the ease with which some LLMs are churning out code may actually point the way to what that bridge may look like — namely, a world where we can understand and constrain the model, and it is still writing software for us.

It seems almost bland to say, but the unavoidable truth of the coming years is that moving between and through this two-models of software world will mean a whole new class of “dev tool” companies are about to be built.

[One ending sidenote: The title of Andrej’s post and the name of this newsletter are, of course, funny in their similarity. For transparency’s sake, let me quickly detail how I came to “SW2.ai” as a name. One, I was convinced this was a “second great phase of software” (you can argue with me if you think it’s the 3rd or 4th, that’s fine). Two, I did not want to use the O’Reilly “2.0” nomenclature. And three, I, perhaps lazily, just wanted something that developers would see and instantly know what I was going after. Software. Next great transition point and phase. AI. Hence, “SW2.ai.” ]