The 3 minute postdoc interview
Hiring postdocs at the chemistry-AI interface
I recently ran a round of postdoc interviews for a position that sits right at the chemistry / molecular simulation / AI interface. The ad was explicit: I was looking for someone with demonstrated experience in AI—through publications, and through code they had written themselves to solve practical problems, involving actual algorithmic thinking implemented by the applicant.
On paper, many applicants seemed to match what I had asked for. The CVs looked right. But once the interviews started, it became clear that a lot of people could not explain the basics of what they claimed to have done. The feeling was familiar. In molecular simulation, it’s like saying you have done umbrella sampling but not knowing how WHAM works.
Over the years, the WHAM question—and similar ones, I have a bunch of them about statistical mechanics, thermo, simulations and sampling—had become a staple of my interview process. More importantly, I had learned how to ask these questions naturally through a conversation and get a clear sense of possible red flags.
AI, on the other hand, was new territory for interviewing. In the past, I would give candidates a coding exercise to submit. But of course now, with ChatGPT-type and vibe coding tools, the traditional coding exercise has become close to redundant.
I also found that even looking at code repositories had become harder to use as a primary screen. Repos are often multi-author, and with the volume of applicants and the limited time available for initial screening, it was difficult to reliably infer what parts were genuinely driven by the applicant versus inherited, adapted, or contributed in smaller pieces. I don’t think this is anyone’s fault—it’s just the reality of collaborative code and fast-moving tooling. But it meant I needed an interview mechanism that could scale, and that could get at “conceptual ownership” quickly.
So I did something simple. I asked my group members to help me draft a short list of “absolute minimum” questions—things that any candidate for this specific position should be able to answer. I told all applicants ahead of time that I would start the interview with a few basic questions, and if they couldn’t answer them satisfactorily, I would stop the interview early. I gave them a day or so of warning—enough time to refresh basics, not enough time for this to become a possible memorization game.
I implemented these screening questions on zoom as a rapid-fire, game-show-style round, with essentially no time to look anything up. If there was more than a few seconds of delay, I would give a hint. If there was another pause after that, I would move on to the next question.
The outcome surprised me: I ended up stopping about 70% of interviews after roughly three minutes. While I was only surprised, my students who helped come up with the questions were shocked.
At first, I defined “satisfactory” as getting all of them right (and maybe even the bonus question). Later I relaxed it to “around 4 out of 7,” but the real criterion was not the score. The criterion was whether the answers reflected the level of understanding implied by the CV. If someone lists a model or method as something they developed or used deeply for a paper, then explaining the core ideas should not require a long warm-up.
Now, I should say something that bothers me about this format. I fully understand that a rapid-fire interview can introduce bias: someone who is shy, anxious, or simply not optimized for interviews may be at a disadvantage, even if they would thrive in the actual work environment. I also realize that any hard filter like this risks excluding applicants who might have turned out to be excellent in the role.
But given the realities of screening at scale, I found this approach useful for one specific reason: it helped me quickly identify cases where the match between “what is claimed” and “what is understood” was unclear, and it helped me avoid spending interview time in directions that were unlikely to be productive in a fast-moving research setting. I don’t think I have a clean solution yet for making this both scalable and maximally fair. The only honest thing I can say is that I’m aware of the tradeoff, it bothers me, and I’m going to keep iterating on the process so it becomes more robust for different interviewing styles. I’m a human after all.
In the end, I hired two fantastic people who were creative and honest. And the process—while a little sad at times—made the whole experience cleaner and more efficient. I’ll be running another round soon, and I have a longer list of questions I’m looking forward to using.
Here are the questions I asked (seven total, plus 2 bonuses):
What is the difference between an autoencoder and a variational autoencoder?
What is the “T” in ChatGPT?
What are the similarities between ChatGPT and AlphaFold (high level)?
What is positional encoding?
What is the difference between L1 regularization and L2 regularization?
What is KL divergence?
What is the difference between a normalizing flow and a diffusion model?
Bonus 1: What is the difference between forward KL divergence and reverse KL divergence, and when might one be more relevant than the other?
Bonus 2: Explain geometrically how the performance difference between L1 regularization and L2 regularization arises.
One last point. I actually think that, for my group, the deeper long-term need is still thermodynamics / statistical mechanics / molecular simulation method literacy. I ask those questions too and as I said above I have a deeper grasp on how to screen for those - through my own experience doing molecular simulations for more than 2 decades and unfortunately, through having made mistakes early on in the hiring process as a new Principal Investigator. But what stood out in this search was how often “AI experience” on paper did not translate into basic conceptual clarity during a conversation.
I’m about to start the process of interviewing new postdocs again in the coming few months (keep your eye out for the advertisement!), and I have a whole suite of similar questions ready for the next round of three-minute screens. It will be fun.


Couldn't agree more. Like faking form in Pilates.
Those questions were reasonably easy (unless you make them do diffusion math :p).
Have you tried asking them to explain one of your own papers (for ex. learning MD with LSTM, etc.)?
Also see this beautifully illustrated post on ridge regularisation - https://thomas-tanay.github.io/post--L2-regularization/