Analyzing Telegram chats with R and Matlab

I am using Telegram a lot for chatting. Since it is possible to export the chat history with everyone you have been chatting with I wanted to create a word cloud featuring the most common words with my chat partners.

I use several tools for this. There is telegram-history-dump which itself uses telegram-cli for exporting the chat history. Then I use Matlab for some postprocessing (or preprocessing, depending on how you view it) to bring the dumped output in a (for me) manageable form. Lastly I use R to generate a word cloud like the one at the top of this post.

Continue reading “Analyzing Telegram chats with R and Matlab”

Setting up shiny-server and creating a first app: “How long are 4 seconds”

I liked the idea of Shiny and being able to deploy an app easily via Shiny Server. Therefore, I tried the installation and app creation process out myself.

Here is the final result. Try to estimate the time span of 4 seconds!

Continue reading “Setting up shiny-server and creating a first app: “How long are 4 seconds””

Every cyclist of the Tour de France in a single CSV file

Actually, it is just every finisher of the Tour de France between 1903 and 2017. It was quite a pain to gather all the data and had to be done manually to some extend. Therefore it is entirely possible that some errors were made. I will give a detailed description for the generation of the data set. All used scripts and the final data can be found on Github.  The picture above shows that the Tour is getting shorter and faster.

Continue reading “Every cyclist of the Tour de France in a single CSV file”

Running distance and pace distribution with R

Getting the data – again

Similar to this post, I again gathered my data. This time however, I bulk exported everything from Polar instead of Garmin (there is an app called SyncMyTracks that synchronizes different services).

Continue reading “Running distance and pace distribution with R”

Visualizing the the best male running performances with R

The data

With a bunch of data about the best performances in different running events, I wanted to learn how to produce meaningful plots and inside with R and ggplot. The data is originally taken from http://www.alltime-athletics.com/men.htm a website by Peter Larsson. I did a lot of postprocessing to be able to handle the data more easily. I also wanted to learn how to work with R and data frames. Most of the source code was taken from StackOverflow or other sides. Especially helpful was also the ggplot documentation. The data set, together with the source code that produces the figures below can be downloaded from Github.

I consider data for the male running events: 100m, 200m, 400m, 800m, 1.500m, 5.000m, 10.000m and the marathon. For each event, there are hundreds to thousands of results. These results are sorted by best performance. I.e., this might be the 1000 best marathon performances. For each of these performances, we have the associated rank that comes along with that performance. But also data like the name, date of birth and nationality of the athlete. Some typical lines in the original data file look like this:

1, 2:02:57 , Dennis Kimetto , KEN , 22.04.84 ,1, Berlin , 28.09.2014
2, 2:03:02 , Geoffrey Mutai , KEN , 07.10.81 ,1, Boston , 18.04.2011
3, 2:03:03 , Kenenisa Bekele , ETH , 13.06.82 ,1, Berlin , 25.09.2016

Meaning that Dennis Kimetto from Kenya ran the marathon world record in Berlin in 2014 with a time of 2:2:57h. The second best performance all-time was from Geoggrey Mutai and the third best performance from Kenenisa Bekele. The data was obtained in July 2018, so there might be some changes in the future.

Continue reading “Visualizing the the best male running performances with R”