Greetings, everyone. My name is Scott Hebert, Simulation Guru at SimWell, and one of my main functions in the company is answering questions about simulation theory and practice. While answering those questions, I discovered some recurring topics. This blog series is meant to provide those interested in simulation as performed by SimWell with some insights into these topics.
Today’s topic concerns the intersection of data science and simulation modeling in projects and how to “break through” the divide between these domains.
What Divide?
Many of you may be curious as to why we are starting with this topic. After all, aren’t we all doing the same thing? Is there even a divide? My experience says yes, there is one. Often it’s a difference in emphasis rather than a stark contrast, but I have personally witnessed discourse between data scientists and simulation modelers completely break down. And despite my status as a simulation guru, I’m sorry to say that the breakdown is on both sides.
Origins of the Divide
Where did the divide come from? The main issue is confusion about emphasis and sources of truth. Data science, as the name implies, sees data as ground truth. As a result, data scientists are very loath to get away from data. “What does the data say?” is a quintessential data science question. Simulation modeling sees data as a useful tool, but the ground truth for a simulation model is the system being modeled, regardless of data. This is compounded because different simulation methods vary in their approach to data. This can lead to the erroneous belief that all simulation models are data-driven and therefore must fall under data science. Other similarities between the disciplines (e.g., using advanced statistics and algorithms) further cloud the divide.
Effects of the Divide
The effects of the divide become apparent when stakeholders on a project represent each of these camps, and the divide is not acknowledged. This leads to communication breakdowns and frustration because members of each camp speak their own language and talk past each other.
Examples include simulation modelers wondering why data scientists insist on referring to data at every turn, or requests for things like process maps being met with questions about why they are needed. On the other side, data scientists don’t understand the methodological issues with using data directly or why subsectioning data for validation purposes is not always required for simulation models like it is for typical ML activities (e.g., training supervised ML algorithms).
How to Resolve the Divide
Because the core of the divide is poor communication, improving communication is a major step toward resolving issues. The other major component is realizing the divide exists.
As an example, here are two terms that are not always used similarly between data scientists and simulation modelers:
- Model: Something as simple as “model” is not generally used in quite the same way. Data scientists will likely use “model” as shorthand for a statistical model and are less likely to refer to the entire solution provided as a “model.” Conversely, simulation modelers will use “model” to discuss the simulation model in its entirety and are more likely to include elements that data scientists might see as separate, such as data visualization.
- Verification: I’ve seen few data scientists refer to “model verification” by that term when discussing the topic with simulation modelers. They refer to testing, and because of their greater capacity with software engineering, they are likely to have a more robust approach to testing the model design. Simulation modelers tend to reduce verification to “debugging” their model.
What the Camps Can Learn From Each Other
Besides recognizing that the disciplines do not overlap as much as expected, there are many concepts and approaches that data scientists and simulation modelers can learn from each other. In a future post, we will expand on this, but here is one concept from each side that can highlight how we can help each other.
What Is Simulation?
Okay, data scientists. A simulation model is a data generation engine. It’s that simple. You can also see it as a sophisticated data transformer. As such, there is no prescriptive part of a simulation model. Data science is often called upon to visualize and analyze simulation outputs (and sometimes inputs), and you should be prepared to lead in those project areas. However, remember that simulation modelers generally have an excellent statistical understanding and are generally closer to the system than you.
Simulation’s Elephant in the Room
This is a different issue, but let’s be honest: Data scientists generally have a much better understanding of computer science and software engineering than simulation modelers. Modelers who use simulation software backed by a full programming language (such as AnyLogic) do better here, but the educational background of most simulation modelers includes very little in the way of good coding practices or topics such as version control and test-driven development. When in discussion with data scientists, simulation modelers should approach these topics with humility and be willing to learn.
Taking these examples together, data scientists would ideally lead the verification of a simulation model, whereas simulation modelers would spearhead the validation of that model.
Conclusion
I hope you took something from this discussion and can move forward in projects with better clarity about the roles of various disciplines. If you have any comments, feel free to contact us to discuss this further. I hope we can see where different disciplines can work together better. Until next time, this is your friendly neighborhood Simulation Guru!