Notes

Shorter, lower-overhead pieces: paper reading notes, conference reflections, and project-status updates from QMI Lab and AstroLLM. Long-form essays live at /writing/.

Subscribe via the notes feed.

The deeper pool that recovered nothing

June 1, 2026 · 5 min read · AstroLLM · Retrieval · Evaluation

When the corpus grew, the relevant papers stayed in the candidate pool but the fused ranking stopped surfacing them. I deepened the pool from 50 to 500 candidates per arm; the candidate union reached a perfect 1.000 and the fused top-10 did not change by a single query.
The widening that lowered every score

June 1, 2026 · 4 min read · AstroLLM · Retrieval · Evaluation

I made the corpus five times bigger expecting better coverage. Every score on the same queries got worse, and the gap I had already decided to build on grew wider.
The headline that was within noise

May 31, 2026 · 4 min read · AstroLLM · Retrieval · Evaluation

I wrote down three predictions about which retrieval arm would win, then watched the aggregate Recall@10 ranking dissolve into noise at twenty-nine queries. The findings that held up were single queries, not averages.
The label review that lowered my score

May 31, 2026 · 3 min read · AstroLLM · Retrieval · Evaluation

A retrieval pilot over 500 real exoplanet papers scored Recall@10 in the low 0.8s; reviewing my own relevance labels pulled it down to 0.69. The drop is the part worth trusting.

The deeper pool that recovered nothing

The widening that lowered every score

The headline that was within noise

The label review that lowered my score