Gathering every Bundesliga game ever played in a single CSV file

I struggled to obtain a compressed data set that contains the outcomes of every soccer game ever played in the Bundesliga so I created this data myself, available here at Github. The CSV file can also be directly downloaded here: fullBundesligaMatchHistory.

The above link is from 7/19/18, the Github repository might contain a newer version.

I crawled the Kicker website to obtain the HTML website that summarizes every game-day of a season. Then it was a matter of simple filtering for which I used Python. Everything is stored in a CSV file. The repository contains multiple files. In the file resultsExtended.csv, the data is in its most comprehensive form. This includes team names and weekdays coded as integers as well as the date and time. Thus, this is the perfect data to use as an input for any machine learning algorithm as it is numeric.

"","SeasonFrom","SeasonTo","Matchday","Day","Date","Time","Home","Guest","Score90","Score45","Score90Home","Score90Guest","Score45Home","Score45Guest","PointsHome","PointsGuest","GoalsTotal90","GoalsTotal45","HomeID","GuestID","DayID","DateInt","TimeInt"

The first lines look like this:

 "1",1963,1964,1,"Saturday",1963-08-24,17:00:00,"TSV 1860","Braunschweig","1:1","1:0",1,1,1,0,1,1,2,1,1,9,1,19630824,61200
"2",1963,1964,1,"Saturday",1963-08-24,17:00:00,"Münster","HSV","1:1","0:0",1,1,0,0,1,1,2,0,2,10,1,19630824,61200
"3",1963,1964,1,"Saturday",1963-08-24,17:00:00,"Saarbrücken","Köln","0:2","0:2",0,2,0,2,0,2,2,2,3,11,1,19630824,61200
"4",1963,1964,1,"Saturday",1963-08-24,17:00:00,"Karlsruhe","Meidericher SV","1:4","0:3",1,4,0,3,0,2,5,3,4,12,1,19630824,61200
"5",1963,1964,1,"Saturday",1963-08-24,17:00:00,"Frankfurt","K'lautern","1:1","1:1",1,1,1,1,1,1,2,2,5,13,1,19630824,61200
"6",1963,1964,1,"Saturday",1963-08-24,17:00:00,"Schalke","Stuttgart","2:0","2:0",2,0,2,0,2,0,2,2,6,14,1,19630824,61200
"7",1963,1964,1,"Saturday",1963-08-24,17:00:00,"Hertha","Nürnberg","1:1","0:1",1,1,0,1,1,1,2,1,7,15,1,19630824,61200

For four games, the half time score was not available. I set that value to 0:0.

Two games were terminated and repeated on another day, I removed these unfinished games.

The whole procedure along with the source code is uploaded and I encourage everyone to double check the results.

To my best knowledge, this data includes every party at every game-day of the season 1963/64 to 2017/18 of the German 1. Bundesliga.

Leave a Reply