Creating a Data Science Team from scratch

Lucas Fonseca Navarro
GetNinjas
Published in
6 min readAug 15, 2017

--

In this article I describe how we started our Data Science Team here at GetNinjas, what are the skillset of the team members, how we develop our projects and the processes we use. My intention is to share with other companies the experiences we’ve had in this process.

After all, what is a Data Science Team made of?

Well, of Data Scientists, people with knowledge of statistical/mathematical modeling, business expertise and hacking skills to build a complete (machine learning) system… Even though it looks the most obvious response, in a wide range of companies — mostly startups — that is not true.

.

The ideal data scientist profile involves a wide skillset. Theoretically, the scientist should be able to receive a business problem with a dataset, create a model (most of the times statistical) and put a system into production to solve it in real time. As many companies have come to realize, it is very rare to find this unicorn, a super employee who would work like a silver bullet solving all the problems of your company. If it’s hard to find one, imagine a team full of them.

An alternative being adopted in the market is to create Data Science teams with complementary skills, which together form the ideal profile of the data scientist.

Here at GetNinjas we use the structure of Multidisciplinary Squads (such as spotify) where each squad takes care of a specific part of the product. Initially the Data Scientists were scattered across all teams. After some time we realized that on a day-to-day basis the scientists operated in parallel tasks to the rest of the squad, they interacted more with the scientists of the other teams than with the development team of their own squad. This was due to the nature of Data Science projects, which requires a completely different skillset than the web and app development teams. We then decided to join the 3 data scientists we had in the company at the time plus two other developers who were in other teams, creating our Data Science squad which would initially work on projects that require models based on data to make decisions, acting on all Areas of the product. Usually we create micro-services that are consumed by the other teams.

How did we choose each of our team members

To fill the gap of unicorns, we assembled a team with complementary skillsets, we wanted them to be capable of acting end to end, identifying business opportunities, creating models, deploying them to production and measuring/learning from the results. Here at GetNinjas we have the following structure:

Product Manager: When a new problem arises, it breaks this problem into hypotheses and tests or refutes them analytically thought data. In the end it generates a set of insights that will be consumed to aid the modeling of Data Science solutions. Also manages team developing processes and act as the bridge of the team with the business owners to align the projects’ road map. Data Scientists: With the data and insights in hand, they create models (algorithms), usually with some mathematical or statistical function (we use machine learning only when really necessary). The scientists are also responsible for implementing their models (we have a python template for that). They also must have some business sense to the modeling tasks.
Developers: Responsible to support our Data Science template, mostly the web service part along with the infrastructure of our projects (micro-services). They develop the integrations with other projects of the company. The Developers also helps educate the rest of the team with good development practice, how to make automated tests, system architecture and micro-service issues.

One cool thing about our Data Science team members is that everyone has basic knowledge of all areas, and though there’s no unicorn, no one gets isolated in their inner circle. I act as product manager, but sometimes perform development or modeling tasks, in the other hand, sometimes a data scientist performs analytical tasks or helps with the integration or infrastructure of his system. This role-playing leads to very productive discussions on a daily basis leading to high quality in our deliveries and continuous improvements on the development flow and team processes.

I set up my team, now how do I solve my company’s problems?

After a few sprints with our Data Science Squad, we have created a life cycle in which practically every data science project goes through. We usually start from a business problem in a specific area of our product. We create hypotheses about the problem and prove them analytically. With the results of the analyses at hand, we set out to create a model based on data — usually using statistics and mathematics. In these early stages scientists interact a lot with each other, discussing several possible solutions, before a final solution is agreed on.

GetNinjas’ Data Sciences Projects Lifecycle

Then we implement the model using our Python DS template. The vast majority of our projects are micro-services that are consumed by projects of other company teams. Because of this, with the model implemented, we still need to build the integration and setup the infrastructure to deploy this micro service into production. We align with the team that will consume our service what will be the inputs and outputs of the service — often a simple JSON payload.

When everything is ready and integrated, we create a dashboard (using Metabase, a great open-source visualization tool) to measure the project results on a daily basis. Data Science projects are usually models based on real-world data, and the environment changes very fast. The models can get obsolete, so tracking results is key to continuously improve our projects.

Never create a complex model and then adapt your problem to it. Start out from the problem, discover the needs to solve it (mostly the 80% value with 20% effort rule), and scale the complexity of the solution (model) only when necessary.

Increasing (a lot) agility in the implementation of our team projects

GetNinjas’ Data Science Template Stack

As our projects have a similar nature, in the beginning of the team we decided to build a Python template, which is used in the implementation of all our projects, greatly increasing the agility of the implementation and also dismissing Data Scientists from more technical parts, leaving more time for them to work in their specialty which is creating complex models.

Our template has a training and a prediction service, which are classes in our template, in which the data scientist needs to edit, as well as a pre-processing step to process input data when necessary. Training is not mandatory, as not all of our services uses machine learning.

Currently we use Scrum with 2 week sprints, and our tasks are classified among the types described in the life cycle. We use planning poker, voting only on development tasks. Analytical and modeling come in as variable points. From the beginning we have structured the process and life cycle, but we make improvements to them at each sprint, adapting to the reality of our team and company. The agile framework to be used should be the one that works for your team!

Although we have a well-structured team, we’re continuously learning and evolving together as a startup should do in every level. This is what has been working for us, hope I could enlightened you with some tips when structuring your own data science team. In the next months I intend to write an article about our Data Science template with technical details, and another one describing our project life cycle in practice, using one of our cases as an example.

--

--

Co-Founder & CEO at Já Vendeu, helping people sell their stuff without any effort