Creating a Multidisciplinary Data Science Squad: Upsides and Challenges

Published in

GetNinjas

5 min readMar 6, 2018

Nearly every technology company has a Data Science area, but this field still is very young in the industry. The technical part is the most mature, but in terms of structure and processes (some scientists defend that agile as it is don’t even work very well) we still don’t have any strong pattern to follow.

Another complexity arises when most of this technology companies started adhering to a Multidisciplinary Squads structure. In this scenario, the natural structure would be to have data scientists inside each squad (such as developers, PMs, designers, BI analysts, etc). We tried this out at GetNinjas and it didn’t work for us, so our alternative was to create a Multidisciplinary Data Science Squad. But what drives the decision to choose each of the models?

*Spread Data Science Area (left) vs. Data Science Multidisciplinar Squad (right)*

In my first article about Data Science I talked about how we created our Data Science team at GetNinjas. In this one I intend to present the reasons that led us to create a Data Science Squad and explore the tradeoffs between this approach and the one where the Data Scientists are spread over the other company squads.

The problems of Data Scientists spread over Multidisciplinary Squads

At GetNinjas we have multidisciplinary squads responsible for each part of our product, being all independent and operating as owners of that piece of the product. In this configuration, where the teams are a mix of different professionals such as designers, BI analysts, developers and a product manager, at first we treat Data Science just as every other area, putting one Scientist per team (when needed). After some time, we realized that the work that the Data Scientists did in each squad were very different from the rest of the squad member’s work.

For example: I worked as Data Scientist in the Professional’s Experience team, the team was responsible for the Pro’s app — the platform that our Pros use to purchase and manage leads. The team usually develops new features for the app, having a Designer working with the PM in these features in advance and some mobile and back-end developers to implement them. As a Data Scientist I worked with machine learning or statistical based algorithms (such as predicting if a lead would be purchased or not in real time, the lead pricing algorithm, lead distribution algorithm and so on).

An app feature compared to machine learning project is different in so many ways: not just conceptually speaking but the technology itself is completely different too. I was working at microservices in Python or R Scripts while the rest of the team worked using Java, Swift and Ruby. Another difference is that the normal process to develop a new feature was the following: the BI analyst makes an analysis showing opportunities, some ideas are architectured with the PM then the designer brings the usability and layout of this new feature and after that it’s passed to the developers. A Data Science project usually starts with an analysis too, but afterwards comes in the modeling & experimenting phase to try to find an algorithmic solution. After we come up with the more satisfactory model, then we implement it (and the cycle of a data science projects tends to be higher than a app feature). Despite the processes that doesn’t fit, another problem we had is that the Data Scientists were supposed to do all of the steps since they were isolated in each team.

Instead of using a Design Fluxogram and Layouts, Data Scientists used analysis and experiments results to drive the development of solutions. Because of all that, the Scientists ended up pairing up more with each other than with the developers of their respective team. After realizing all of these problems, we decided that we should experiment changing way we did Data Science at GetNinjas.

Data Science as a Multidisciplinary Product Squad in practice

We decided to experiment a Multidisciplinary Data Science Squad. With this approach, the Scientists can pair with each other on a daily-basis and we can hire people with complementary skills, instead of trying to find the unicorns that can do it all. (which is an amazing advantage given the difficulty to hire unicorns theses days).

Another great advantage is that the data scientists have much more touch with the business and the users. Machine Learning projects autonomously making decisions directly affects the users’ experience but in most companies the Scientists didn’t have this perspective. With this approach, the team is responsible to work end-to-end in the projects, instead of being limited to create and increase algorithm’s performance they solve business problems assuming all of the variables and tradeoffs behind their algorithmic designs (and this is huge).

At last, being an independent team means that we can use the processes that best matches Data Science Project’s Development. We have our own Data Science Pipeline, which consisted in an “Analysis > Modeling > Implementing > Integrating > FollowUp” cycle and we are constantly enhancing our processes to work around it.

Despite of the advantages, this configuration introduces some challenges: the biggest one is to determine the scope of the Data Science team versus other teams: what parts of the product should Data Science team be responsible for? At GetNinjas we use the following guidance: “If we need to take decisions at a large scale, Data Science should do it”, If something is solvable with a simple business rule, them other teams more related to that scope should do it.

*GetNinjas App: Lead Store (left) and Detail of a Lead (right)*

In the screenshots of our pros’ app, highlighted in green are the responsibilities of the Professional Experience team: they determine the informations that are relevant to the Pros when they will purchase leads (also the UX of the app). Highlighted in red is the price of a lead, which is an output of a Data Science Statistical model that calculates the best price to achieve a target take rate ensuring a fair return for the pro and for GetNinjas. Given that we have more than 1,000 types of services at 5,000 cities, it would be infeasible to determine the right price to each lead manually, so the Data Science team created a pricing algorithm to decide the optimal price for each lead.

Although we decided to experiment with the Data Science Squad, we still didn’t have all of the answers of how to structure the area. This is what has been working for us, but we’re continuously learning and evolving the team in every level. I hope this article enlightened you with some tips when structuring your own Data Science area (specially if you’re using a multidisciplinary squads structure).

Summary: Multidisciplinar Data Science Squad

I would like to thank Bernardo Srulzon and Dominik Reller for the revision of this article an Pedro Suguimoto, Bruno Soares and Alan Justino for reviews in previous writings of this series!

Creating a Multidisciplinary Data Science Squad: Upsides and Challenges

The problems of Data Scientists spread over Multidisciplinary Squads

Data Science as a Multidisciplinary Product Squad in practice

Summary: Multidisciplinar Data Science Squad

Written by Lucas Fonseca Navarro