The Story Seeds of Data Science

Image for post
Image for post

A loving father accused of murdering his own son causes a washed out lawyer to question the purpose of fatherhood.

When his best friend forces him to house a no-nonsense witness, a lonesome art curator resurrects his investigation into the murder of his parents.

Wait. I thought this is a blog about data science.

Well, it is.

As we all know, one of the most common first question that people ask at a networking event is, “What do you do?” And whenever I bring up the fact that I am an engineer (yes, I could literally call myself a rocket scientist at one point) and a screenwriter, many bulge their eyes with “Whaaa…” But what they fail to realize is just how similar screenwriting actually is to engineering and, in turn, data science. From how one comes up with the idea to start a project to how a project is managed from beginning to end.

First, the basics:

Generation of a Story Idea: Character First or Plot First (aka Chicken or the Egg)
Numerous aspiring writers often ask themselves whether they should start writing a story by creating the character first or by building the plot first. This really is a futile question because as they will soon discover, either methods work.

One can ruminate about a certain character, the protagonist, in his mind forever, conjuring up his strengths, his weaknesses, his history, his buttons. One can ruminate about this fictional person until the cows come home. But at some point, if the writer intends to finally get a project going at all, he must build a plot around the protagonist that asks tough, introspective questions, that puts the protagonist through a series of increasingly troublesome events, prompts him to make possibly life altering decisions, and that thus, encourages growth. These events-ideally instigated by decisions made by the protagonist-continuously stretch his comfort zone, pushing him pass his envelope until his bubble is about to burst (aka the plot’s climax). This is what’s called a character-driven story.

At the other end of the spectrum, one can also start a story idea with a plot all played out. Boy meets girl. Girl hates boy. Boy somehow gains girl’s trust. Boy breaks girl’s trust. Boy does the ultimate act for the girl. Boy wins girl. Boy and girl live happily ever after. That’s the plot. But who is the boy and who is the girl? The writer needs to create protagonist(s) who will feel tortured by the plot already established for him. After all, how fun would it be to watch two peas in a pod go about their love life? Or sure, in real life, it give a peace of mind to teach a child who is not afraid of the water how to swim, but in the movies, as disturbing as it seems, teaching a child who is afraid of the water how to swim is much more entertaining and produces a much more meaningful result when he conquers his fear. This is a plot-driven story.

Birth of a Data Science Project: Another Chicken or the Egg
That’s all great, but what does that have to do with data science?

To answer this question, let’s think about why a discipline called data science exists in the first place. For years-perhaps even decades-companies have painstakingly gathered data from every source possible. Yet, countless businesses still do not know how to take advantage of their assets. This is similar to the character-driven story. The data is the protagonist, and businesses must ask the data what can it reveal so that it can produce business value and growth for the company and its customers. Granted, this may possibly be the hardest part of a data science project. Even though this is easier said than done for many companies out there, it is also the most important. Nonetheless, once those questions are generated, the analyst can now put the data to work through a series of plot events-or in this case, analytics techniques-that will challenges the data to answer those dire questions and promote the business to innovation.

Of course, the reverse is also likely. The company already knows the questions that will create business value. It is just not sure which data will answer those questions. Perhaps the data does not even exist! So much like a plot-driven story, the questions are there and the analyst likely already know the type of analysis technique needed or can look them up in the millions of resources under his fingers. Problem now revolves around data collection or its character traits. What attributes would need to be collected so that the data would best answer the questions and create business value for the business? And for many, this step can be the most agonizing, time-consuming step of all. Plenty of ground-breaking data science projects get delayed because of this, but it needs to be done just like a plot is nothing without characters.

But, I think we can all agree, whether you are a numbers-person or a words-person, whichever method you choose, the end result is a beautiful thing.

And That’s Just the Beginning
Still unconvinced about the similarities between screenwriting and data science? Well, stay tuned.

Originally published at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store