The Reporting/Dashboarding Dilemma!

Data scientists and dashboards: a complicated relationship

Posted on July 3, 2019 | 1358 words

A couple of weeks ago I read a discussion on twitter initiated by a tweet from David Neuzerling who highlighted an observation about data science teams being pushed towards reporting/dashboarding inside organizations. Observation: any data science team will always face pressure from within an organisation to become a reporting/dashboarding team. — David Neuzerling (@mdneuzerling) June 19, 2019 I had some reflections from a previous experience and various discussions so I thought about gathering them in a blog post for future reference and further discussions. [Read More]

dashboards data science

Teaching Shiny Workshop at SatRday Johannesburg

Eating the cake first and playing games!

Posted on April 27, 2019 | 516 words

This month three SatRdays took place on the same day around the world. I was invited to one of them, SatRday Johannesburg, to give a training and a talk at. I was excited about it because it was my first #rstats event in Africa and I wanted to contribute to events in the region where few conferences are held relative to Europe or the US. The workshop I gave was about Building Web Applications in Shiny. [Read More]

R Shiny SatRday

Collaborative Filtering Using Embeddings

goodreads recommendations

Posted on February 10, 2019 | 2646 words

Everyday we deal with online platforms that use recommendation systems. There are different approaches to implement such systems and it depends on the product, the available data and more. This post will mainly focus on collaborative filtering using embeddings as a way to learn about latent factors. I learned about this approach months ago from fast.ai Practical Deep Learning for Coders lessons. I also found out that StichFix wrote about the same concept in a blog post last year Understanding Latent Style. [Read More]

python Keras fastai

Intro to Text Classification with Keras (Part 3 - CNN and RNN Layers)

Posted on February 1, 2019 | 1538 words

In part 1 and part 2 of this series of posts on Text Classification in Keras we got a step by step intro about: processing text in Keras. embedding vectors as a way of representing words. defining a sequential models from scratch. Since we are working with a real dataset from the Toxic Comment Classification Challenge on Kaggle, we can always see how our models would score on the leaderboard if we competed with the final submissions. [Read More]

R Keras NLP

Divergent Bars in ggplot2

Step by step guide

Posted on January 25, 2019 | 1367 words

A couple of days ago I was asked by one of the participants I met at the workshop I gave in DataFest Tbilisi about a simple tutorial on plotting divergent bars in ggplot2 and other bar chart stuff. I was going to write a gist with explanation but I decided to write this post instead to share with others whenever they ask during/after a workshop or in other occasions. [Read More]

R ggplot2

Intro to Text Classification with Keras (Part 2 - Multi-Label Classification)

Posted on January 24, 2019 | 1427 words

In the previous post, we had an overview about text pre-processing in keras. In this post we will use a real dataset from the Toxic Comment Classification Challenge on Kaggle which solves a multi-label classification problem. In this competition, it was required to build a model that’s “capable of detecting different types of toxicity like threats, obscenity, insults, and identity-based hate”. The dataset includes thousands of comments from Wikipedia’s talk page edits and each comment can have more than one tag. [Read More]

R Keras NLP

Intro to Text Classification with Keras (Part 1)

pre-processing, embeddings and more

Posted on January 21, 2019 | 1747 words

Keras provides a simple and flexible API to build and experiment with neural networks. I used it in both python and R, but I decided to write this post in R since there are less examples and tutorials. This series of posts will focus on text classification using keras. The introductory post will show a minimal example to explain: text pre-processing in keras. how and why to use embeddings. [Read More]

R Keras NLP

R-Ladies at DataFest Tbilisi

Posted on January 4, 2019 | 563 words

In November 2018, I attended DataFest Tbilisi 2018 as I was invited by R-Ladies Tbilisi to give a talk, a workshop and mentor participants in a Datathon. It was a great opportunity and I would particularly highlight the second and third day were we had R-Ladies Room for R lovers with a series of workshops and a Datathon organized and led by R-Ladies who were the main representatives of the R community there. [Read More]

rladies R teaching

Handling R errors the rlang way

Custom conditions, subclasses and more!

Posted on September 28, 2018 | 1700 words

Every day we deal with errors, warnings and messages while writing, debugging or reviewing code. The three types belong to conditions in R. You might hope to see as few of them as possible, but actually they are so helpful when they describe the problem concisely and refer to its source. So if you write functions or code for yourself or others, it is a good practice to spend more time in writing descriptive conditions. [Read More]

R rlang tidyverse

Tidy Eval Meets ggplot2

The Bang Bang Plots

Posted on July 6, 2018 | 1123 words

Almost a year ago I wrote about my My First Steps into The World of Tidy Eval. At the end I tweeted asking Hadley Wickham and Lionel Henrey whether ggplot2 was compatible with the tidy eval, They said that it was on the todo list. Finally, ggplot2 3.0.0 got released last week with the support of tidy eval, so I thought it was time to write about it! ggplot2 3. [Read More]

R ggplot2 tidyeval tidyverse