In Search of the History of the Vector Database

Pinecone, Weaviate, Chroma, Milvus. Those are a few examples of the new spate of vector databases. They are, for the most part, very well funded and gaining massive adoption. Vector databases — which store data as high-dimensional mathematical representations, or vector embeddings — are one of the cornerstones of developing anything in the realm of AI.

All of that got me wondering about the history of vector databases, or even more specifically, where was the first vector database developed, and who was the person (or team) that did so? I’ve now spent a couple of hours poking around, trying to find the answer to those questions. It has, by no means, been a comprehensive research project, but, in my poking, I discovered a few things.

As it turns out (and I have to give a hat-tip to my investing partner, Paul, for finding this direction via a conversation with a friend), the origins of vector databases lie deep within the fields of biotechnology and genetic research. Now, things are a bit hazy and I’m lacking some exact details (more on that in a bit), but bear with me.

By the late 1970s, DNA sequencing was springing forth as a broad new area of research. Storing those vast amounts of DNA chain data called for a new method, one focused on high-dimensional vectors. That need appears to have spurred database innovation (though this is where the fog of history is thick) through the 1980s and into the mid 1990s. Then, by the late 1990s to early 2000s, we have clear usage of vector databases both at the NIH and Stanford. As genetic research continues to deepen and accelerate through the 2005 to 2015 timeframe, vector databases seem to grow in parallel, with things like the UniVec database being in use by 2017. And then, between 2017 and 2019, we get the explosion in vector databases that brings us to where we are today.

[Sidenote: there seems to be widespread reporting that the “first big use case” for vector databases was for next-gen search around product recommendations. That seems, at least in my initial looking, to be wholly incorrect. The first big use case is clearly in genetic research.]

You’ll notice in the above that I have a lot of holes left to fill in trying to suss out the history of the vector database. For one thing, I have yet to find the place where it first happened, and the person (or team) who was responsible. But, further than that, there is a large chunk of time (from the early 80s to the late 90s) that lacks significant details. And so, I turn to you, dear reader. Surely, someone out there knows the true and real account. Surely, someone somewhere can bring forth that which currently seems lost to history. Hence, today’s send. As always, let me know if you can shed light on this apparent mystery. I will report back to this space with the findings that you bring me.

Lastly, I’d leave you with this: is it not a bit strange that the foundational data store for building artificial intelligence finds its origins in the sequencing of biological life?

In Search of the History of the Vector Database

Keep reading

SW2.ai

Home