Game of Thrones Deaths are Helping Everyday Americans

Serena Gupta
5 min readMay 14, 2019

“There are only two means of ascertaining death. One is by no means infallible and involves a brain scan and cardiogram. That requires expensive machinery. The other, and the only one that is certain, is putrefaction. And that requires time.”

There’s a surprising amount of truth in this line from The Serpent and the Rainbow — biologically speaking, indicators of life are fallible. Slow, shallow breaths can be hard to observe, the human pulse can be affected by drugs, and someone who just fell off the Titanic isn’t going to have a normal body temperature, even if they’re still alive. There are a lot of cases a medical professional has to take into account, and a lot that can’t be anticipated — there’s a need for adaptability.

Identifying dead individuals in a dataset is also harder than you might think, and there’s still a need to take into account a number of cases. An individual may stop appearing in a claims dataset because they’ve died. Then again, this could also be because they’ve switched jobs and their insurance changed. In that case, say, Walgreens data would no longer have the customer, and instead they’d be in an Aetna dataset. And as of 2011, a state can prohibit the Social Security Administration from publishing (even for research purposes) who has died in that state — the result is that there aren’t even stable government data sets that can be used for research.

A comprehensive dataset of deceased individuals is vital in order to understand the efficacy of new therapies, or to be able to predict life expectancy from a procedure. At Datavant, we’ve aggregated and deduplicated data from the Social Security Administration’s death master file and third party data from public and private obituaries, to allow death data to be connected to the rest of the health ecosystem.

From a technical perspective, some of the biggest challenges when building data pipelines are bad data (like an accidently inserted comma), and unintuitive edge cases (like the redaction of an incorrectly marked death). To ensure we had as robust a pipeline as possible, we needed to create a robust test data set. Even though our tests would be automated, I felt there was high value in creating a readable test dataset. When future developers need to make a change to the pipeline, or are trying to understand how it works, it’s important that they’d be able to quickly and easily understand what’s going on. For an easily accessible pipeline that runs every Sunday night with an increasingly high body count, there was no other choice for test data than Game of Thrones.

But really Game of Thrones provided the PERFECT test data, as shown here by a smattering of applicable cases (**Spoiler Alert**):

Lanced, knifed, shot, axed, hung & stabbed all in one, cleaved, stabbed
  1. One of our data sources unmarks someone as dead if they realize later that they are actually still alive. Beric Dondarrion, who died seven times, and was brought back to life six times by the Lord of Light to save Arya was a natural choice to test this case. Repeatedly.
Death by incineration (i.e. dragon)

2. Our data source also reports when someone they’ve previously reported as dead is still dead, but has updated data. For example, if date of death was previously reported wrong and now they have the correct date, this amendment would be reported. In the land of Game of Thrones where characters never seem to get his name right, Dickon Tarly would probably be initially named as Rickon Tarly.

Mortally wounded by a wild boar after drinking an excessive amount of wine given to him by Lancel Lannister, on the orders of Cersei Lannister
Burned alive by molten gold (crown)

3. Both our data source report fields like “Title”. In a show where everyone seems to have badass titles and is constantly trying to claim more of them, this made for very funny data. My favorite example applies to both Robert Baratheon and Viserys Targaryen — whoever is King gets to be called “King of the Andals and the First Men, Lord of the Seven Kingdoms and Protector of the Realm” in a very long-winded column of the test data set.

Poison
Poison
Suicide

4. If the same person appears as dead in both of our source data sets but some of their data (such as the day of death) is different, we have to have a system to resolve these two sources of truth. In Game of Thrones, if the Lannisters reported the death of Myrcella, Tommen, and Joffrey, their last name would be reported as Lannister: however, an official at King’s Landing would have reported their last name to be Baratheon.

Game of Thrones made for a rich test data set that was very fun to create. But the importance of testing our pipeline has impacts far beyond the realm of Westeros. Working as an engineer, especially with fake data, gives you a too-easy opportunity to step back. You can easily forget the context of what you work on, and what it means for real people. The truth is that I worked for weeks with a data set of almost everyone who has ever passed away in the US. Every week when I ran our pipeline and received new data, each row was more than just data entries. It represented one person’s life coming to a close, and the lives of so many others forever changing as a consequence. And so I do my best to acknowledge the truth of the data I’m working with. Every week, I take time to honor the lives of individuals whose living life impacted so many people and whose life will continue to impact the future of life-saving research.

--

--