26th June | 陈's Microblog

Good

Building Things in the Quiet

The post-finals void I wrote about earlier this month has been slowly filling itself in.

Not with obligations, but with projects. Four of them, in fact, all self-initiated, all ESD-adjacent, and all completed without a grade attached to any of them. That last part matters more than it sounds. When there is no rubric waiting at the end, you are forced to make your own judgement calls about what counts as done, what counts as rigorous, and what you are actually trying to learn.

The answer I kept landing on: build things that would tell you something true.

Here is a look at what I have been working on.

NLP Earnings Intelligence

The first project started as a question that had been nagging at me since the SVB collapse in 2023. The bank’s management appeared publicly confident right up until the end. But what were they actually writing in their SEC filings?

The answer, it turns out, was quietly alarming. I built an NLP pipeline that processes 10-K and 10-Q filings for eight U.S. regional banks across 2020 to 2024, using FinBERT for sentiment scoring and BERTopic for unsupervised topic discovery. The most revealing signal was not the overall sentiment score, but a divergence metric: the gap between how management discussed operations versus how the risk factors section was written. When those two tones drift apart, management is, in effect, downplaying something the lawyers felt obligated to document.

That divergence flagged SVB and First Republic one quarter before their respective 2023 failures.

Getting that result validated something I had suspected from the Statistical and Machine Learning module: what you measure matters more than how precisely you measure it. A cleverly constructed feature will outperform a more sophisticated model applied to a naive one every time.

Geospatial Urban Analytics

The second project came directly out of my HASS module on digital geographies. All that time thinking about uneven infrastructure access made me want to actually quantify it.

The project applies spatial econometrics to California census tract data, asking which areas should be prioritised for federal broadband infrastructure grants under the BEAD programme. Using LISA clustering and spatial lag regression, the analysis surfaced a result that felt almost obvious in retrospect but required the modelling to make concrete: a pure spatial-targeting rule based on rural cold spots reaches only 22.9% of underserved households. The 77.1% miss rate comes from urban and peri-urban households that are geographically dispersed but economically unable to afford existing connections.

The spatial autoregressive coefficient of 0.678 in the spatial lag model was a satisfying confirmation that broadband access behaves like a neighbourhood phenomenon. Your neighbours’ connectivity predicts yours. That is not an argument for geographic targeting alone; it is an argument for pairing infrastructure investment with affordability policy.

It also gave me an excuse to spend a lot of time with folium and interactive choropleth maps, which I have decided is one of the more enjoyable parts of doing this kind of work.

Customer Lifetime Value

The third project started from a practical annoyance. Most CLV tutorials online show you how to rank customers by historical spending, then call it a day. That is RFM analysis dressed up in fancier language, and it has a specific failure mode that anyone who has worked with retail data recognises: deal-seekers look valuable until they stop buying.

The Thornbury & Co project applies BG/NBD and Gamma-Gamma models to a synthetic but realistic retail dataset, and the central result is blunt. Deal-seeker channel customers have a month-1 retention rate of 4.5% and a campaign ROI of negative 0.49. Enthusiast-channel customers, despite lower historical spend, have 31% month-1 retention and a 10x campaign ROI.

The BG/NBD model works because it explicitly models the probability that a customer has churned, rather than assuming everyone is still alive and just buying infrequently. Once you account for churn risk, the customer rankings flip, and the marketing budget allocation changes completely.

I also built a Streamlit dashboard for this one, which forced me to think about how to communicate these numbers to someone who is not going to read a methodology section. That translation exercise, from model output to a decision a business can actually act on, is something I want to get much better at.

A/B Test Framework

The final project was the most technically dense, and also the most humbling.

The scenario: a B2B SaaS company runs an experiment testing whether a guided 5-step setup wizard improves week-1 activation. The headline result is positive: +22% relative lift, p = 0.001, 99.9% Bayesian posterior probability of superiority. On the surface, an easy ship decision.

But the guardrail metric, day-1 drop-off, increased by 5.3 percentage points with high statistical confidence. The wizard improves activation for users who get through onboarding, but it appears to be filtering out users who bounce early. That is not a clean result. It is a segment-specific result, and the decision to ship should depend on whether the company cares more about activation depth or top-of-funnel reach.

Building the full analysis, including power calculations, sequential O’Brien-Fleming boundaries, CUPED variance reduction, Bayesian beta-binomial inference, and the segment-level forest plots, was a useful reminder that the technical machinery of experimentation is only as good as the decision framework that surrounds it. The framework I included moves away from a binary ship/no-ship output and toward a recommendation matrix by segment.

I think that is more honest about what an A/B test can actually tell you.

What This Sprint Was Really About

Looking back at these four, there is a loose thread running through all of them. Each one is a case where the naive approach, bag-of-words sentiment, spatial-only targeting, RFM ranking, binary test outcomes, gives you a result that feels like an answer but obscures something important. The more careful method surfaces the thing that the simpler one misses.

That is, I suppose, the actual skill I am trying to develop. Not just knowing the models, but knowing when a model’s output should prompt more questions rather than close the loop.

The self-study format turns out to be a genuinely good way to test that. There is no rubric. You decide when it is rigorous. And that forces you to confront the parts you would otherwise skip.

I am glad I did not skip them.

💡

“The whole problem with the world is that fools and fanatics are always so certain of themselves, and wiser people so full of doubts.”

Bertrand Russell

  [··········          ]

14th June