Images of Memory
Issue #73: A Memory Research Group update
In this issue: Kei Kreutler, convenor of our Memory Research SIG, discusses metaphors for computational memory in LLMs and before. The group meets every two weeks on Discord to discuss a text related to memory across disciplines, and you’re welcome to join the next session.
If you’re reading this newsletter, then there’s a good chance you have a memory of the scene from 2001: A Space Odyssey in which astronaut Dave Bowman, alone aboard the spacecraft Discovery One, decides to dismantle HAL 9000, the computer system controlling the ship. He goes to the memory terminal and, one by one, proceeds to remove glowing tapes. Meanwhile, HAL sings a simple song that falters and fades out, like an animatronic toy with dying batteries.
This is one image of memory we have in pop culture. Other shared images of memory related to digital technologies might include a USB on your keychain or, for some of us, the translucent, colored memory cards of early game consoles. There might be a faint archival image that comes to mind: people standing in large rooms with terminals spinning magnetic tape or, in other words, an image of an early computer system.
Given the large degree to which metaphors from digital technologies shape our understanding of memory, it’s surprising that our images of memory are so abstracted from how humans experience memory. While we might conceptualize our memory as hard drives, for many of us, the actual experience of recall can feel much more muddily intuitive than browsing a digital filing cabinet of the mind.1 These images often suggest that memory is a discrete object, unlike the well-worn path around a mountain that remembers, through its form, those who walked on it.
To explore these nuances in our images of memory, the Memory Research Group has been surveying different digital technologies and our metaphoric relationships to them.
We naturally looked at one of the hardest problems first: how memory works in LLMs. We read the blog post What is KV Cache in LLMs and How Does It Help? Most LLMs use KV caching: the technique of storing previously computed data in the form of key value pairs during a conversation. When a person sends a new prompt in the conversation, the LLM can access data previously generated in its response without having to compute it again. This makes the computation costs required for an ongoing conversation with an LLM scale linearly rather than quadratically.
Without KV caching, LLMs would be far too computationally expensive to work in the way they do today. Applying the analogy to a human conversation, it would be as though every time you spoke a new word, first you would have to silently rehearse every prior word you’d said so far. It would take so long, and be so redundant, I’m not sure anyone would have the patience for conversation.
But humans aren’t LLMs, and LLMs aren’t humans. Their memory problems are far from solved.2 In one example, after about 200 conversational exchanges between an LLM and a person, the KV cache storage size in gigabytes can be greater than the memory required for the transformer model weights. It’s still a bottleneck, and the blog post we read summarized current approaches to solving it.
While memory storage in LLMs clearly has quantity issues, it also has quality issues. In my experience, an LLM’s memory often feels less like persistent contextual awareness and more like they’ve been primed to answer in superficially relevant ways. (How many times can they paste my gardening interests over my questions about digital memory architectures?) There are multiple emerging strategies other than KV caching that try to address these issues across conversation windows. Yet with memory problems in LLMs, we’re approaching new computational terrain, which can be better understood within the history of computer architecture development.
So, in our next Memory Research Group session, we explored the concept of the cache in early computer architectures. We read Berkeley computer scientist Alan Jay Smith’s ‘Cache Memory Design: An Evolving Art,’ published in 1987. Written at a moment when caching was maturing from an engineering experiment to standard architectural practice, the paper highlights how cache design remains a balance between cost, complexity, and performance.
Caches are small, fast memories that store recently used data so it can be quickly accessed later. Researchers first invented the concept in the late 1960s, based on two simple observations about how programs actually behave: they tend to access the same data over and over—referred to as temporal locality—and they tend to need data that’s stored near each other—referred to as spatial locality.
A library is the metaphor commonly used to explain caches. Imagine you’re working on a research paper. Rather than making constant round trips to the library every single time you’re looking for information, you check out the books you think you’ll need, and you keep them on your desk at home. In this example, your desk becomes the cache, allowing you to quickly browse the most relevant books. You can only keep a limited amount of books on your desk, though, because you don’t have as much space there as the library has in its stacks, illustrating why a cache is fast but limited.
A library’s layout also mirrors this principle. Libraries keep frequently checked-out books in a small cabinet near the front desk, while they move older, rarely needed books to off-site storage. In this example, using prior data to predict usage, libraries give the spatial locality and temporal locality of frequently checked-out books priority.3
In the time that’s passed since Smith wrote the paper we read, caches have evolved into sophisticated hierarchies, and research increasingly explores using machine learning to choose which data to prefetch. The concept has also become useful for various tasks beyond a computer’s central processing unit. Web browsers cache data from websites you visit frequently, content delivery networks cache popular content closer to users geographically, and databases cache frequently accessed queries. Now LLMs also depend on caching. Anywhere there’s a speed-versus-storage trade-off, there’s likely some form of caching happening.
In our Memory Research Group discussion on cache memories, we found ourselves immediately translating the concept back into the spaces we actually inhabit. One person described their kitchen: the counter holds ingredients within arm’s reach like the cache, while the pantry requires slowing down to retrieve less frequently used items, a kind of “main memory,” analogous to the slower hard discs used in computers. Another person reflected on their studio practice, noting how different zones of the room hold different “heat” depending on when work was last touched. Their projects flow between spaces according to temporal patterns, like cache locality principles.
The insight about hot and cool zones in a studio made me think about dust. Dust reflects physical entropy. The more dust on something, the less recently it’s been used. This acts as a natural “least recently used” indicator, a common algorithm used by cache memories to choose which data should move to slower storage. It’s how we know without any formal tracking system what hasn’t been touched in our personal archives, even our public ones.
As soon as you start looking at images of memory through digital technologies, especially computer architectures, you see both the connections and the discrepancies in these metaphors. Caches optimize for speed, but sometimes the salience of a memory doesn’t reflect its deeper helpfulness to us. Dust tells us what we haven’t touched, but it can’t tell us why we’ve avoided touching it. In 2001: A Space Odyssey, the spacecraft world of HAL appears dustlessly sterile. More muddy images of memory, often found through software rather than hardware metaphors, give us richer language for the human experience of remembering. Environmental metaphors, like the well-worn path around a mountain, come into view again.
As the synthetic memory maker in Blade Runner 2049 explains, “They all think it’s about more detail. But that’s not how memory works. We recall with our feelings. Anything real should be a mess.”
The 1987 paper we read captured caching at a moment of transition, and we’re in another one now with LLMs that may greatly reshape images of memory in the future. It’s a change we don’t really have an image for yet.
For the next Memory Research Group sessions, we’ll continue looking at technological transitions, from magnetic core memory to the advent of virtual memory, before closing the year with a session focused on the open question: What metaphors for memory await us next?
That is, unless you’re a memory champion, which I wrote about in my last Protocolized post Reflections from Memoria.
Though many people would say our memory problems aren’t solved either. Or maybe that they’re not a problem to solve. Who knows.
Architect and Summer of Protocols alum Chenoe Hart joined us for the session and shared some of her insightful research on addressable space including the history of library systems.









