8 November, 2019

WHAT DO YOU NEED TO LEARN TO WORK IN FOOTBALL ANALYTICS?

Analysis and Sports Technology
Sports Performance
160K

The question I am asked most often is about the skills needed to become a data scientist at a football club. For many, analyzing football is a dream job. If you enjoy both the game itself and statistics, nothing could be better than combining the two in a career. The question then is what skills you need to develop in order to find a position at a club?

 

To answer that question it is best to start by looking at the data that is available.

 

Ten years ago, the data used by clubs were limited to stats on goals, shots, numbers of corners, possession etc. This data has limited value to coaching staff. While it might be worrying if your team is conceding too many shots or failing to gain possession, knowing this fact doesn’t provide coaching insights. The typical stats we see on TV do not, in themselves, help teams win games.

 

The second wave of football data came in the form of on-the-ball event data. The biggest supplier of this data, Opta, provide (x,y) coordinates of every pass of the ball, every defensive action and every shot. Opta is now one of several data suppliers, including Statsbomb, Wyscout and several betting companies, who collect this form of data.

 

Event data has proved useful to many clubs, in particular, in scouting players. The best-known statistic in this context is expected goals, which measures the quality of chances players create. Other more advanced metrics include expected assists, passing models that assign a value to every pass based on how much it progresses the ball, and possession chains which measure involvement in attacking sequences. These stats, along with more traditional measures, such as tallys of heading duels, interceptions and pass completion, are often presented in the form of a player radar. The radar shows how each player compares to others playing in the same league.

 

I know from first-hand experience that many club scouts love these diagrams. It gives them, for better or worse, a way of confirming their beliefs about a player or finding new talent to have a look at in more detail.

 

To be able to deal with and analyze event data you need to be able to program, preferably in Python or R, and you also need to learn about basic statistical modelling. Expected goals is a logistic regression model. Passing models use either logistic regression or basic neural networks. These are topics that come up in all good undergraduate statistics degrees and Masters courses in data science and are covered in online courses.

 

While it is important to know about ‘on-the-ball’ data, the future of football analytics may well lie elsewhere. I caught up with Raúl Peláez Blanco, Head of Sports Technology Innovation Analysis at FC Barcelona and asked him about the data the team currently uses.

 

He got straight the point, “We do not rely on event data in player evaluation. We believe we need to understand how players act in different contexts. For example, if we are looking at a winger who dribbles very well in counterattacks, we ask how he also dribbles when the opposing defence is organized. Event data doesn’t tell us this.”

 

“Before we sign a player, we must examine how he solves problems in the contexts he will face at Barcelona.”, Raul told me, “It has become popular to categorize players using data without taking into account these contexts, but this distort realities.”

 

It would be wrong, however, to conclude that Raul is opposed to the use of data. On the contrary. For him, the question is about using the right data.

 

“The problem with event data is that they are decontextualized, we don’t know how the rest of the players are positioned when a pass is made, for example.”, he told me, “Instead we use positional data of the 22 players and the ball. This helps us find tactical insights for the coach.”

 

The 22-player data, the third wave of data in football, is much richer than event data. As the name implies, it contains the co-ordinates on the pitch of all the players, as well as the position of the ball. This is essential for understanding context. During a typical match, Luis Suarez has the ball for less than 90 seconds of the 90 plus minutes of match time. What Suarez, or any other player, contributes to the play—pressing, runs to open up space and tactical positioning—can’t simply be measured in shot statistics.

 

For Raul and his team, the first step towards using this data has been automating the work of video analysts. “A few years ago video analysts spent most of their time recording games and labelling matches and workouts.”, he told me, “Now the computers can do the labelling and the video analysts can concentrate on generating insight.”

 

Performing these tasks requires skills in machine learning and computer vision. Algorithms are needed to correctly identify the players’ positions and body orientations in real-time, as well as decide whether a situation is a counter-attack or an established possession. This problem still isn’t fully solved, and the algorithms make mistakes. Even in the top leagues where multiple cameras are used to film matches from multiple angles, tracking data still isn’t 100% reliable. A job for an ambitious young computer scientist maybe?

 

Despite the limitations, the 22-player tracking data is already reliable enough to start to generate insights. For example, physicist William Spearman, now working at Liverpool FC, has developed a passing model which shows which passes are possible and which will be blocked. Last year, one of my Master’s students in computational science, Fran Peralta Alguacil, implemented a similar model to Spearman’s in order to look at player decision-making (see figure 1). He was able to show how ‘disruptive runs’ by Barcelona players opened up space for their teammates. The project involved heavy use of his skills in physics to simulate both player movement and ball dynamics. Without proper scientific training, Fran wouldn’t have been able to simulate ball motion.

Figure 1: (a) Match situation. The left winger (Alcacer) runs to open up space for the left back (Alba)  (b) Tracking data and passing model output. Dots show positions and lines directions of players. Blue line indicates a pass which is made. Green areas are a possible pass, while red areas are where passes are blocked by the opposition.

Another skill is implementing code on parallel computers, so that results can be presented immediately. “The staffs of the professional teams will be looking to use data to make decisions in real-time. Computers will offer second opinions to the coaches so that they can make changes during the matches.”, Raul told me.

 

For me, the take away from talking to Raul is that anyone wanting to get into football analytics should think widely. Data science and statistics are important, but there are also opportunities for those with a good understanding of physics, computer vision or parallel computing. Coaches and sports scientists will also have to develop their skillsets in order to make the most of this new analytic approach. They will have to adapt to understand what the mathematical models are telling them and to know which results to trust and when to rely on their own intuition.

 

One last thing. It is important to be a team player. Raul referred back to Javier Fernandez, whose work we looked at in the previous article.

 

“Javier is a generous person who shares everything he learns”, Raul told me, “That is also the philosophy at Barça. This does not take away from competitive value because the true value is learning. The final formulas are only the culmination of the work, the most beautiful part is the path taken there.”

 

So whatever path you take into data analytics, make sure you approach it openly. Talk to others, learn and share your knowledge. This will create the football analytics of the future.

 

Sign up for the Certificate in Football Tactical Analyst

 

David Sumpter

RELATED NOTES

DO FOOTBALL PLAYER’S MENTAL ABILITIES INFLUENCE THE RISK OF INJURY?

Mental abilities, although not yet fully appreciated, are already considered a relevant part of performance. But their importance could go beyond that: Do they also influence the injury risk, including recurrence, once the player returns to play?

INJURY TYPE IN HANDBALL MAY VARY BY POSITION, CATEGORY AND MATURATION OF PLAYERS

Although several studies have tried to evaluate the characteristics of the risk of injury in handball players, they have been unable to reach sufficiently reliable conclusions. A new study of all the FC Barcelona handball categories has attempted to shed more light on the subject.

HOW PHYSICAL DEMANDS ON FOOTBALL PLAYERS VARY BY THEIR POSITION

Although there are several studies on this topic, many of them have analyzed these demands by looking at just a few variables or using very broad timeframes. A new study completed by physical trainers from F.C. Barcelona has analyzed several of these details more closely.

THE GREAT UNKNOWN OF MUSCLE INJURIES: CONNECTIVE TISSUE IN THE EXTRACELLULAR MATRIX

An article published in The Orthopaedic Journal of Sports Medicine —in which members of the club’s medical services participated— now suggests to consider the detailed structure of the area affected, and treating the extracellular matrix as an essential player in the prognosis of the injury.

WHAT IS LOAD MANAGEMENT REALLY ABOUT?

In this article, Tim Gabbett and his team provide a user-friendly guide for practitioners when describing the general purpose of load management to coaches.

TWO WEEKS ARE ENOUGH TO INCREASE MUSCLE VOLUME AND STRENGTH

For the first time, it has been demonstrated that it does not take months of training to significantly improve both muscle volume and strength; instead, two weeks of an appropriate exercise are enough.

Degrees of freedom or degrees of slavery?

The understanding of the modifying variables of the game, based on the degrees of freedom.

THE IMPORTANCE OF BIOSTATISTICS IN SPORTS SCIENCE AND SPORTS MEDICINE

Sports Analytics has grown exponentially thanks to IT sciences and it also encompasses other subareas (e.g. sports sciences, behavior sciences, medicine or data visualization) in addition to statistics with a focus that is more tactical and sports performance related.

ECCENTRIC EXERCISES: STUDYING “VACCINES” FOR MUSCLES

Training using eccentric exercises is important to prevent possible damage. However, intensive training can also cause muscle damage, so it is critical to be vigilant in order to keep injury risk to an absolute minimum.

STABILITY OR INSTABILITY, IS THAT THE QUESTION?

The importance of building a game model in football.

THE LEVEL OF RESISTANCE AS A MODERATOR OF THE TRAINING LOAD

Cardiovascular endurance manifests as a moderator of the load result to which the athlete is exposed.

Are the players well positioned in relation to the ball?

Through the use of computer vision we can identify some shortcomings in the body orientation of players in different game situations.

DO YOU WANT TO KNOW MORE?

  • SUBSCRIBE
  • CONTACT
  • APPLY

KEEP UP TO DATE WITH OUR NEWS

Do you have any questions about Barça Universitas?

  • Startup
  • Research Center
  • Corporate

PLEASE COMPLETE THE FIELDS:

PLEASE COMPLETE THE FIELDS:

PLEASE COMPLETE THE FIELDS:

The Form has been sent successfully.

PLEASE COMPLETE THE FIELDS:

PLEASE COMPLETE THE FIELDS:

PLEASE COMPLETE THE FIELDS:

The Form has been sent successfully.

PLEASE COMPLETE THE FIELDS:

PLEASE COMPLETE THE FIELDS:

PLEASE COMPLETE THE FIELDS:

The Form has been sent successfully.