FIFA World Cup Champion's Curse

 

source: Is the Champion's Curse real? (FIFA's YouTube channel, 2019)

Perhaps my biggest data analysis project yet, split into two parts where multiple tools were used such as Python, R, Excel and everything in between.

Sports potential in data collection and analysis is immense! Which is why once I found my passion in data science, I chose to focus on a sport I like or at least is more relatable to me. FIFA World Cup is one of the few championships I care about. How could I not? It's one of the most prestigious championships in the world!

It was easy to decide to look at teams' performances and make a model to predict the next winner. But it felt too easy. Everyone would like to predict the next winner. I, however, decided to look at a more interesting phenomenon. Champion's Curse! 

I explain the curse, the analysis plan and everything in the project's GitHub repository. But I'll just briefly explain what this curse is and why it matters.

 What is the Champion's Curse?

Since 2002, winners of the World Cup would always be eliminated in the groups stage in the next World Cup. As an example, When France won the 1998 World Cup, they were eliminated in 2002 World Cup before advancing to the knockout stage. Most recent example is Germany, when they won the 2014 World Cup and were eliminated in the groups stage in 2018. This "curse" affected multiple title defenders throughout championships from 1998 to 2018. The only team that broke the curse was Brazil in 2004 after they won the 2002 World Cup. In the next World Cup, they performed better than other championship defenders by reaching the knockout stage. 

FIFA World Cup Champion's Curse Analysis (Part I)

In the first part of the analysis, I apply everything I learned as a data scientist. I start with the main stages of analyzing data:

  1. Data collection
  2. Data cleaning & matching
  3. Data visualization
  4. Data analysis
This project was extremely helpful in learning all these stages because it required all of them and more to get to a point where the data is usable. It was an excellent practice in Python and multiple libraries, including pandas, numpy, matplotlib, seaborn and scipy. The most exciting part of the project was the fact that there was a huge hole in the dataset because of the Cold War where Germany played in the championship as a team with a different name. I had to find multiple sources for data and match them together to come up with the final viable dataset. 

I followed the analysis with discussion of the phenomenon and looking at potential reasons. Specifically, I looked at the players of each team to see if they were replaced with new players between championships.

I don't want to spoil the results, I'll let you see for yourself!

FIFA World Cup Champion's Curse Analysis (Part II)

Next comes Part II of the project. Part I laid a good foundation for further analysis besides looking at teams' players and how often they were switched around. And the natural analysis potential of sports data opens up a whole other suit of tools I could use to find the secret behind this curse, or if it's even real!

For this part, I only used Excel and R for analyzing multiple aspects of the sports. I look at teams' performance again from a different perspective and using a different method. But I also look at teams' performance in other tournaments, their performance as a hosting vs. away teams, and even the stadium the play on!

This part allowed me to try similar methods I used using python but with different tools. But also allowed me to explore and analyze more. I used multiple statistical tests to accomplish my goal and answer all my research questions, such as ANOVA, Tukey's Post-hox test, Shapiro Wilkes test, and more!


Overall, this two-parter project really helped me understand data analysis and the power behind it. The first part was for an introductory data science class, and that's when I realized I really love it!