Data sets

Click here to go to the Github repository. More information is given here. Here is a preview of the data.


1903,1,MAURICE GARIN,1,TDF 1903,94h 33m 14s,94,33,14

1903,2,LUCIEN POTHIER,37,TDF 1903,97h 32m 35s,97,32,35

1903,3,FERNAND AUGEREAU,39,TDF 1903,99h 02m 38s,99,2,38

1903,4,RODOLPHE MULLER,33,TDF 1903,99h 12m 44s,99,12,44

1903,5,JEAN-BAPTISTE FISCHER,12,TDF 1903,99h 41m 58s,99,41,58

1903,6,MARCEL KERFF,9,TDF 1903,101h 37m 38s,101,37,38

Click here to go to the Github repository. Summary from Github:

History of every Bundesliga game ever played. Data is obtained from Kicker. The website was crawled and the obtained html files have been filtered to reduce everything to a single csv file. The 3 Point rule was starting in season 1995/96, before that winning yielded 2 points (0 to loser), after that 3 points (0 to loser). A draw is always 1 point each. The summary is in the following format:

SeasonFrom | SeasonTo |Matchday| Day| Date| Time| Home| Guest| Score90| Score45| Score90Home| Score90Guest| Score45Home| Score45Guest| PointsHome| PointsGuest

Click here to go to the Github repository. Summary from Github:

The Men’s running data set. All the data is taken from a website by Peter Larsson. Women data can be obtained here: This repository contains .txt files for the following events:

Men’s 100m, Men’s 200m, Men’s 400m, Men’s 800m, Men’s 1500m, Men’s 5000m, Men’s 10000m, Men’s Marathon

The .txt files contain in each row an entry similar to this example:

1 9.58 +0.9 Usain Bolt JAM 21.08.86 1 Berlin 16.08.2009

which encodes information like the position in the list (first entry) the time (second) the name (here fourth) and others.
Each file contains multiple thousand lines of results, i.e. the first thousand best performances in their sports.

Click here to go to the Github repository. Summary from Github:

The following files include a compressed representation of the first 1 million users of the Friendster network in Matlab. All data is taken from, where a complete copy of the Friendster network is stored ( At, the available files contain all profile data in html format. However, often it is not necessary to work with the complete profile data. We do not claim, that the presented data is correct, nor that it is complete. All profile information have been gather by using grep on the profile.html files. There are user profiles, which are not public and therefore, no information could be extracted. We did not differentiate between private profiles / deleted profiles / missing profiles.

Click here to go to the Github repository. Summary from Github:

We took the 79 largest cities of Germany from Wikipedia ( Using the Google Maps API, we calculated the distance for a road trip from one city to all other cities, repeating this process for every of the 79 cities. Therefore we obtain a matrix for the duration of a trip from one city to another, as well as for the distance. Additionally, we used the coordinates from the same Wikipedia article and the radius of the earth to compute the great-circle distance between two cities.

Click here to go to the Github repository. Summary from Github:

This README explains the content of FlightData.mat. FlightData.mat contains six different variables. It is a summary of “On-Flight Market Passengers Enplaned” arriving or leaving the US in the year 2010. Note, that the data is arranged in such a way, that the entries in Names are sorted according to the number of passengers leaving an Airport, meaning that sum(Traffic) is decreasing.

Names Names is a cell array of three letter airport identifiers, e.g. ‘ATL’ for Hartsfield- Jackson Atlanta International Airport.
Latitude Latitude is a double array that contains the latitude of the airports listed in Names.
Longitude Longitude is a double array that contains the longitude of the airports listed in Names.
Traffic Traffic is a matrix that contains in row i, column j the number of “On-Flight Market Passengers Enplaned” from airport Names(i) to airport Names(j).
Distance Distance is a matrix that contains in row i, column j the degree (in deg) of the angle that is spanned by the shortest connection of the airport Names(i) and Names(j). The distance in meters is then given by the expression 2 pi R Distance(i,j) / 360, where R is the radius of the earth.
FullAirportDetails FullAirportDetails is a cell matrix that contains detailed information about the airports. The order is the same as in Names. The information provided are: “Name”, “City”, “Country”, “IATA/FAA”, “ICAO”, “Latitude”, “Longitude”, “Altitude”, “Timezone”, “DST”.

Sources: For flight data:
For airport data:
License: The AirportDataSet is made available under the Open Database License. Any rights in individual contents of the database are licensed under the Database Contents License. In short, these mean that you are welcome to use the data as you wish, if and only if you both acknowledge the source and and license any derived works made available to the public with a free license as well.