On Bandwidth and Bottlenecks
AI tools help us go faster. But speed is not all you need.
Thank you for being here. Please consider supporting my work by sharing my writing with a friend or taking out a paid subscription.
It may seem quaint compared to all of the hullabaloo over the latest agentic AI workflow or foundation model, but one of the most useful solutions to come out of the generative AI advances of the last several years is that voice transcription is essentially a solved problem. This has enabled a whole new set of workflows which come with a set of questions that we’ve implictly answered. It’s worth looking a bit more closely.
Speaking over typing, reading over listening
For a long time, we tried to solve transcription by cobbling together a collection of sophisticated signal processing and statistical modeling to transcribe streams of audio. The results were, on the whole, unsatisfying. These tools were ok, but they often had no way to process the broader context clues that would enable, for example, accurate punctuation in the transcribed text. Enter Whisper.
When OpenAI released Whisper in 2022, they added one more feather to the hat of the bitter lesson: more data almost always wins. Built on top of a transformer architecture and trained on over 600,000 hours of multilingual data, Whisper represented a step change in the way we were able to go from voice to text.
Fast forward to 2025, and voice interfaces are becoming more and more popular, powered by their much improved accuracy. Except in situations where it would be impolite to do so, I most often find myself speaking my prompts to Claude on my phone. I laughed when I saw this tweet from a few weeks ago showing a new setup from a software engineer who is using voice as a primary way to interface with AI coding tools. I don’t have the cool nerdy microphone setup yet, but I have likewise found this workflow pretty powerful. Speak to the AI agent, read the output, click a few buttons, rinse and repeat.
The reason these new voice-powered interfaces are becoming more and more popular is that they are increasing throughput. Said another way: speaking is faster than typing. This same desire for speed is the reason that people generally prefer to read the responses rather than listen them. While input is faster via voice, output is faster by sight.
This is all good and well if we’re trying to optimize for speed. But in our giddiness about moving more quickly, we seem to be ignoring the fact that all this speed may not in the end be for our best.
Is more speed really worth chasing?
Our modern world is obsessed with speed. Harder, better, faster, stronger, as Daft Punk would say. But is all this speed actually making us better or stronger?
If we stop to consider for a moment, we can find examples all around us that argue against this assertion. One of my favorites is from optics. The motivating question? Why don’t we have gigapixel smartphone cameras?
If you thought for a minute, I’m sure that could come up with a number of reasons why gigapixel smartphone camers don’t exist.
Smartphones are small and might not fit that many pixels
More pixels equal more data, and smartphones have limited storage capacity.
Bigger sensors take more energy, and battery life is one of the most important metrics on a mobile device.
All of these reasons are true, but there is an even more fundamental reason that gigapixel smartphone cameras aren’t a thing. While it is understandable to think that more pixels means sharper images, this turns out to be a necessary but insufficient condition. Why? Because physics.
A crash course in pixels, diffraction, and resolution
I’ll try to give you the short version, but the fundamental reason we don’t have gigapixel cameras is that the resolution of the images is limited not by the number of pixels but by the optical resolution provided by the lens, subject to the limits of diffraction.
Diffraction, a property of the wave nature of light, means (among other things) that the smallest point that you can focus light to is approximately half the wavelength. This means that for visible light, you cannot focus light to a spot size of less than about 200 nm. Our predicament is further exacerbated by the fact that this 200 nm spot size assumes that we have a numerical aperture of 1. This numerical aperture is only found on very expensive lens systems that are carefully optimized using the combination of many different lenses in tandem. Practically, the limitation on getting high numerical apertures on lenses is that you need to solve a complicated optimization problem to minimize the impact of optical aberrations. These imperfections that are part of the design or manufacture of the lens prevent it from achieving the theoretical performance limit.
Together, all of this means that making the pixel size smaller than the diffraction limit doesn’t actually add any resolution or sharpness to the image. The incoming light field doesn’t support that level of resolution because of the limited numerical aperture of the lens system. You can shrink the pixel size as much as you’d like and throw as many pixels into your sensor as you can, but you are just going to get more pixels per blurry spot in your image. Meanwhile, all the prices that increase along with more pixels still apply: storage, energy, heat, etc.
Consider whether you have the bandwidth to support the speed
Ok, with that slight aside out of the way, let me land the plane. I promise the example is relevant.
While you can think about the resolution of an image in the spatial domain by thinking about the size of the pixel, another equally valid (and often more useful) way to understand the resolution is in the spatial frequency domain. Representing the signal in this way, higher bandwidths are associated with smaller pixels and higher (potential) resolution, provided you have the numerical aperture lens to support it.
Our obsession with speed often maps onto this analogy of superfluous resolution. We want to go faster for the same reasons we want more pixels. I mean, c’mon, who wouldn’t want a gigapixel image? Doesn’t that sound pretty fancy?
While the gigapixel camera might be good material for the marketing team to sell more phones in a crowded market, it is not directly connected to the quality of the images you’ll get. In fact, you will often get better images from a sensor with bigger pixels because those bigger pixels can be better optimized on other measures such as noise performance.
My take-home message is this: the speed might be valuable, but only if we have the bandwidth to support it. And we as humans have limited bandwidth.
Contrary to our hopes and expectations, going faster may be doing the exact opposite of making us better and stronger.
Got a thought? Leave a comment below.
Reading Recommendations
I’ve been very impressed with the thinking on AI that has been coming from the Vatican in recent months. It’s clear that Pope Leo sees AI as a very important issue. His remarks from late last week were particularly well stated.
Human beings are called to be co-workers in the work of creation, not merely passive consumers of content generated by artificial technology. Our dignity lies in our ability to reflect, choose freely, love unconditionally and enter into authentic relationships with others. Artificial intelligence has certainly opened up new horizons for creativity, but it also raises serious concerns about its possible repercussions on humanity’s openness to truth and beauty, and capacity for wonder and contemplation. Recognizing and safeguarding what characterizes the human person and guarantees his or her balanced growth is essential for establishing an adequate framework for managing the consequences of artificial intelligence.
I enjoyed this piece from Jillian Lederman on her experience growing up in a big family.
The Book Nook
I only got a few minutes to flip through slide:ology by Nancy Duarte, but I can already tell it’s going to be worth my time to sit with it a bit longer. I quickly resonated with some of her high level comments about hwo to design slide decks, and particularly enjoyed some of the xamples of differeent ways to represent data and draw diagrams.
The Professor Is In
Fellowship applications for the IPAI lab were due a little over a week ago, and I’ve been enjoying reviewing the submissions. As part of the application process, I asked students to build a quick prototype to give me a taste of their imagination in this space. I also asked them to critique their idea.
It’s been fun to see the creativity that folks are bringing and how they’re thoughtfully thinking through what they’ve built. I’m excited to get to work with a few of these students next semester.
Leisure Line
Randy’s Christmas donut season!
Still Life
Always fun to see this fountain on our way through LaGuardia.








Another great essay.
One idea to toss into your mix: I've been thinking about inner worlds in peoples' minds. When someone has a really rich inner world, they can internalize a much more complex system, then think through it faster and more effectively than someone who just a surface-level grasp of what's going on.
Speed is great but it can come at the cost of cultivating the inner world about the specific thing a person is working on.
I've been pondering this because in our house we accidentally created a "character-IP-free zone" meaning my 4yo can't tell you who spiderman is, or Elsa, etc.. But he can tell you mountains and mountains of who would win,a cryodrakon or a quetzalcoatlus, in a battle of pterosaurs. (the "Who Would Win?" book series is an incredible viral hit with young boys). He makes his own narratives and his own characters and his own situations ... which feels like the early form of making up his own machines with his own parts and his own purposes. Speed to entertainment --being fed perfect narratives by the best adults in an industry-- is not that different from being fed the immediate next code file to write.
Once the inner world is set for a given context, I'm all about cranking up speed to 11.
Brilliant use of the diffraction limit analogy! The point about numerical aperture being the real constraint is kinda underappreciated in conversations about optimization. What's clever here is how it maps onto cognitive load theory where working memory acts like that optical bottleneck. The system cant actually resolve more detail no matter how much 'input bandwidth' you throw at it. Makes me think alot about why speedreading never really worked either.