Story Structure: Data Science Process

Image for post
Image for post
Photo by Pedro Velasco on Unsplash


Well, except in creating a story for the silver screen. And in dealing with spreadsheets, that’s inevitable.

The emotional elastic power of structure!

At its best, a project that has good structure provides moments of captivating excitement and moments of serene peace. At its worst, a badly structured project could drown in boredom or stagger in crippling chaos.

Scratching your head? Allow me to explain.

Screenwriting: The Basic Three-Act Story Structure

For you creative writers out there, this might be review for you. For you data scientists and project managers, let me present a concept perhaps new to you or long forgotten. As you read, I encourage you to think about how this structure coincides with your projects.

Below is a graphical representation of the basic three-act story structure:

Image for post
Image for post

The mountain illustrates the level of emotional tension as the story progresses. Let’s break it down.

In the beginning, the story introduces the current situation.

Plotpoint: The Ordinary World

The writer likely already has the protagonist and the physical setting figured out in his mind. Who is in the center of the story universe? What does this universe look and feel like?

Plotpoint: The Inciting Incident (aka The Hook)

What major problem or obstacle must this protagonist confront throughout the entire story? This is introduced before the end of Act One. This rare event turns his routine world upside-down. He has the opportunity to be wishy-washy for a little while, but he must eventually accept the challenge.

Otherwise, the story will seize to continue.

As the protagonist struggles to uncover his solution to his problem, his circumstances become more stimulating. He takes one step forward only to be pushed three steps back. The battles of the war launch the roller coaster of suspense. The audience stays glued on the edge-of-their-seats for this.

The end Act Two has its needle on the bubble…

…and that bubble is about to pop.

Plotpoint: The Climax

Suspense peaks at the climax with an all-out war. This is when the boy makes his final move to win the girl. When Mario fights Bowser. The outcome of this war ultimately decides whether the protagonist wins and solves his challenge or loses in agony.

Either way, the protagonist will return to his Ordinary World. He will return to his old routines. But he is never back his old self. In feature film, for the better or for worse, the protagonist experiences puberty at the writer’s discretion.

Data Science: Three-Act Process

Strike any epiphanies by now?

No? Allow me throw some ideas out there.

Depending on the project, this could either be the easiest or most excruciating part of the process.

Plotpoint: Ordinary World

Raw data enters the scene. It comes from the most reliable sources the data scientist can find, no matter how reliable it may actually be. Similar to the protagonist in a story, raw data has not gone through any manipulation at this point in the process. It is in its current, as-is state.

Plotpoint: Inciting Incident

The Inciting Incident in data science projects is all about the challenge of getting the data to achieve a business value. As mentioned in , There are two
types of data science projects: projects that start with available data and projects that start with known business questions. In either case, it’s crucial to figure out how clean and fit the raw data actually is. It is also important to determine how much effort is needed to convert the data to proper formats so that they can mesh in a coordinated and cohesive manner.

With available data, the challenge is to uncover the data’s business value, and that challenge starts from the very beginning. Available data requires the data scientist to conduct various simple data exploration techniques such as correlation and histograms until hints of a trend can be identified. In some cases, a hypothesis can do wonders to measure the data against. The hook takes place whenever a revelation is found in the data.

For projects with known business questions, the hook arrives when the factors needed to answer them is recognized. This could come in a snap or take forever, but once recognized, wrangling the data into a training set will definitely make data mining breezier and more accurate.

The confrontation. Statistical techniques shall be researched and applied to the data to answer questions posed. Several techniques will likely be used to get to the answer. Researching techniques could involve googling, reading, or community networking. It will involve lots of experimenting and trial-and-error. Some techniques may produce results that are better than others, but hopefully, with each step, the results will get closer and closer to a climatic
answer. Thus, the tension created in this process mirrors that in the raising obstacles of Act Two in a story structure.

Now comes the final hurdle: arriving at the ultimate answer that best answers the question and translating the results into a business story that can be easily understood by non-quantitative people and implement into the business. For most data scientists, this will be the most exciting moment.

Or is it?

Like a writer who can decide on either a happily-ever-after or tear-jerker, the data scientist can end up with good news or not-so-good news for business executives. But first and foremost, be honest. Convey the information using the proper data visuals. And within the big data world, many refer this part of the process as the storytelling part.

The Rubberiness of the Three-Act Structure

After going through all that, it appears that the three-act structure is fixed, but we all know that is not true. interweaves a chronological timeline into a nonlinear one, but when examined closely, one can still see the three-act structure. Same with (Best Picture Oscar Winner) as five storylines run in parallel. Some films have longer Act Ones. Some have a lot of remaining tension in Act Three. At the end of the day, when it comes to the script, it is up to the writer’s discretion to decide what works to create the most enticing film.

Data science project works the same way. Even though he may have to work with what is available, it is up to the data scientist to decide how he needs to proceed to answer the business questions. He may need to take time wrangling the data because the data needs a massive scrubbing, but the analysis can be quick and easy. Another project may require complex, sophisticated visualizations to tell an understandable, comprehensive story despite clean data and smooth analysis.

In spite of the freedom given, the Three-Act Structure remains a foundational concept since the time of Shakespeare whether we are working with words or numbers. Many of us, writers and data scientists alike, want to break free from the status quo. If Paul Haggis, the writer of , can do it, I can do it too. Break the rules.

But how do we break the rules if we don’t know what the rules are? Roger Federer didn’t get to where he is today if he didn’t know how to do the basic forehand like the back of his hand.

So don’t just be familiar with the Three-Act Structure. Really understand what it’s about. Only then will we know how to break free from it.

Data Analyst. Screenwriter. Project Manager. Now, Resume Coach. A student of life and West Coast Swing. A promoter of self from within.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store