Adding Skimr Spark Histograms in Dataframe Columns

A couple of weeks ago, I was looking for a package, I previously passed by, that prints summary statistics with inline histograms. I checked all my bookmarks and liked tweets, but I couldn’t find it! So I asked on twitter. fortunately Maëlle Salmon read the tweet and guided me to skimr by ropenscilabs, who actually release many useful packages. In this post, I will focus on spark histograms in summary statistics and beyond. [Read More]

My First Steps into The World of Tidy Eval

Bang Bang!!

A couple of months ago, Tidy eval was something that I passed by, but didn’t have time to explore. As usual, sometimes one gets busy with the daily work, and puts some stuff aside to come back to. However, I like to find ways that give me a higher level of flexibility and more control. So mid June, I had an inquiry regarding programming around dplyr. I wasn’t sure how to pass a variable column names to purrr::map, so I opened an issue; purrr::map() support for SE/variable column names? [Read More]

Highlights from UseR! 2017

Teaching R to new UseRs, the journey of package development, and more!

In the first week of July, the 14th UseR! conference took place in Brussels as the biggest UseR!. For me, it was the first UseR! and I believe it was a good opportunity to get exposed to different approaches in the data world, see different applications, learn about new packages and meet people in the R community, all in one place. There were lots of interesting things to be highlighted. [Read More]

R Questions Tag Pairs on Stackoverflow

Months ago, I passed by R Questions from Stack Overflow published on Kaggle. I was interested in tag pairs in particular, i.e. which tags appear together in R questions, so I worked on this simple kernel. This week, I had some time so I thought about deploying a simple Shiny App, to give more people access to exploring the tag pairs. So here is the App, where you can see the most frequent tags that appear with a certain tag. [Read More]

Prophet Explore: A Simple Shiny App to Get Introduced to Prophet

Last February, I read about prophet package, which was released by Facebook’s Core Data Science team. I skimmed the published paper Forecasting at Scale quickly and I got the main concept. I also liked how the creators; Sean Taylor and Ben Letham were trying to empower analysts to produce high quality forecasts by offering them a flexible and configurable model that requires general understanding, but not necessarily deep knowledge about time series models. [Read More]

A Glimpse into The Daily Life of a Data Scientist

A couple of weeks ago, I had a discussion with a co-worker regarding a project I was involved in, I felt that there was no clear understanding of the daily challenges data scientists face. Few days later, I was at Rstudio::Conf 2017 where I met lots of data scientists from academia and industry. Later on, I described one of the conference’s positive side effects as “group therapy”, where one could see how others face the same challenges and struggle with similar issues. [Read More]

Yet Another Post on Logistic Regression

Everyday statisticians, analysts and data enthusiasts perform data analysis for different purposes. But when it comes to presenting analyses to wider audience, the good work is not the complex one with big words. It is the one that highlights interesting relations, answers business questions or predict outcomes, and explain all that in the simplest way through data visualization or simple concepts. So if one throws numbers, model coefficients and complex graphs to impress the audience, it might fireback if the audience are not familiar with a certain concept. [Read More]

The Power of (purrr, tidy, broom)-Exploring Climate Change Trends

Few days ago, I wanted to explore the Climate Change: Earth Surface Temperature Data dataset published on Kaggle and originally compiled by Berkeley Earth. The dataset is relatively large as it contains entries from 1750-2014! This was shortly after watching Hadley Wickham’s talk about managing many models with R. So I thought about using the power of purrr,tidy and broom to handle the climate change dataset and I decided to focus on the change in the average temprature in the 100 pre-selected major citis in the dataset. [Read More]

Lessons Learnt About Data Viz - Why a Boxplot Is Sometimes The Worst Choice?

Data visualization is a means of visual communication that should help people understand the significance of data easily and see interesting trends, patterns, distributions,..etc. If your audience fails to grasp the message that was intended to be conveyed by the graph, they are not to be blamed. You are! or to be precise, your choice of the graphical representation of the data. I knew all that, and I used to spend time thinking about the best chart to convey a certain message or to highlight an interesting behavior. [Read More]

R googleVis Line Motion Charts with Modified Options

Using googleVis via R provides lots of options to create nice google visualizations. I was trying to create some charts while exploring the Annual Nominal Fish Catches Data on Kaggle. I wanted to create a line motion chart and exclude the default bubble chart. So I played with the options to get the desired result. The following is a quick explanation of how to do that. Fish Catches Dataset The dataset provides the annual TLW (tonnes live weight) catches of fish and shellfish in the Northeast Atlantic region. [Read More]