corpus christi

corpus christi

Lemme spoil the lede: I've been running a language model over years of chat logs with my wife. No, not to generate fake apologies this time (that was the Markov boyfriend incident, and we don't talk about it). To see what a machine could tell us about us that we couldn't see ourselves.

It started, as most of my bad ideas do, with a question I couldn't let go of.

the data we didn't know we were making

Back in 2018 I cobbled together a self-hosted chat app because my wife and I needed a better way to coordinate groceries and childcare. It was the DIY Trinity in action: easy, frugal, better. What I didn't appreciate at the time was that every "pick up milk" and "you forgot the thing" was also a data point.

Seven years of messages later, I had something I never set out to build: a longitudinal corpus of two humans communicating under the conditions of domestic partnership. Arguments about thermostat settings. Flirty check-ins at 2pm. The slow accretion of inside jokes. The occasional nuclear exchange at 11pm on a Tuesday.

It's not a dataset you can download. It's not the kind of thing you share on HuggingFace. But it is the kind of thing a local LLM can read, thoroughly, privately, and without sending a single byte to a cloud server that will later try to sell me ads for couples counseling.

what the machine saw

Here's the pitch: point a local model at years of chat logs and ask it to do what a therapist does, look for patterns. Not the content (nobody needs an AI auditing whether you actually bought the oat milk) but the shape of the communication.

Things like:

  • Initiation ratios: who starts conversations, who ends them, and how that shifts week to week
  • Tone drift: are messages getting warmer or colder over time? Is there a seasonal pattern? (Spoiler: there is. December is rough. You don't say.)
  • Repair attempts: after a tense exchange, how long until someone breaks the ice? Who does it first? Gottman calls this a "bid." The corpus doesn't lie.
  • Lexical fingerprinting: each person has a vocabulary, a cadence, a set of tells. The model maps these without being asked.
  • Emotional valence arcs: not sentiment analysis in the crude "positive/negative" sense, but the slow gradient of affection, frustration, humor, and withdrawal across months and years

The output isn't a grade. It's a weather map.

why local, why not cloud

This is where the self-hosting religion kicks in, and I make no apologies.

Relationship data is maybe the most sensitive data a person has. The idea of uploading seven years of marital chat to OpenAI or Anthropic, even with their "enterprise privacy" promises, is a non-starter. Not because I distrust their engineers, but because I distrust incentives. A company that offers a free API for your marriage logs is a company that has already monetized your marriage logs.

So the whole pipeline runs locally. A quantized model on my own hardware. No network egress. No telemetry. The corpus never leaves the house, which is kind of the point. The chat app was self-hosted because I wanted ownership of my data. The analysis layer is self-hosted for the same reason. Consistency is a virtue when the stakes are privacy.

the personality profile problem

Here's where it gets interesting (or at least interesting enough that I wrote a blog post about it).

When you have enough text from a person, and I mean enough, like hundreds of thousands of messages across multiple years and contexts, a local LLM can generate a personality profile that's uncomfortably accurate. Not horoscope-accurate. Therapist-accurate.

It picks up on things like:

  • Conflict style: do you escalate, withdraw, or intellectualize? My corpus says I intellectualize. (The wife says I stonewall. The corpus, annoyingly, says we're both right.)
  • Affection language: not the Love Languages pop-psych version, but the actual behavioral patterns. When does affection show up? In what form? As humor? As logistics? As silence that means something specific?
  • Stress markers: vocabulary shifts that correlate with external pressure. Sentence length changes. Emoji frequency drops. The machine sees the canary before you do.
  • Conversational dominance: who talks more, who interrupts (or, in text, who responds first after a fight), who controls topic transitions

None of this is new science. Computational linguists have been doing discourse analysis for decades. What's new is that a sufficiently capable local model can do it on a personal corpus without requiring a PhD or a research grant. You just need the data, the hardware, and the willingness to look at the output.

relationship temperature

I've started thinking of it as "relationship temperature", a rolling, multidimensional read on the state of a partnership, derived not from self-report (which is unreliable) or observation (which is biased) but from the actual communication record.

Think of it like a fitness tracker, but for the thing that actually matters.

Your Oura ring tells you your HRV is down. Your corpus analysis tells you your repair-to-conflict ratio has dropped 15% over the last month. One of these is probably more actionable.

The model doesn't judge. It doesn't say "you should text more" or "your wife is unhappy." It produces data. Patterns. Trends. The interpretation is still human work, and it should be, because the stakes are too high to outsource the meaning-making.

But having the data? That's the unlock.

the uncomfortable part

I'd be lying if I said this was all warm fuzzies and data-driven date nights.

Some of what the model surfaced was hard to look at. Patterns I'd rather not have named. Asymmetries I'd rationalized away. The gap between how I thought I showed up and how the text said I actually showed up.

But that's the whole point of self-hosting this kind of analysis, isn't it? Not to perform for a therapist or feed an algorithm that'll reduce my marriage to a engagement metric. To look at the raw signal, privately, honestly, and decide for myself what to do with it.

The DIY Trinity applies here too. Easy, it runs on my hardware, on my schedule. Frugal, no subscription, no per-query cost, no venture-backed relationship startup inserting itself between me and the truth. Better, because the alternative is either expensive professional analysis or willful ignorance, and neither of those is great.

so what now

The chat app started as a grocery list with a login screen. The Markov boyfriend was a dumb joke that turned out to have legs. And now this: a local AI pipeline that reads years of marital chat and produces something that looks a lot like self-knowledge.

Corpus Christi: the feast of the Body. Two becoming one, made tangible. Turns out seven years of "did you take out the recycling" is a kind of communion too. Just less wine and more resentment.

All of it built on the same instinct: can I?

The answer, increasingly, is yes. Whether you should is a question for you and your corpus to work out between yourselves.

Trust me, I'm a software engineer.