The Joy and Sorrow of the Data Science World

Sharing stories of data practitioners!

Story # 1 Jasmin is a data scientist who works at a tech company in a product-oriented team. She likes to work on designing metrics and urges everyone in her team to reason about what they optimize for from a product perspective to link it to the models she builds. Everyone says that she asks good questions and she is tenacious. One day, a manager (non data science oriented) came to her to ask for some charts showing the “improvement” in the model/application performance and relevant metrics to use for a presentation in a company-wide event . [Read More]

Objective Function Engineering as an Interface Design Problem

Individual users control over algorithms?

How many times have you been on a platform with an objective in mind that is a bit different than what the product optimizes for?. I am saying a bit different because I mean the case when you want to use the original features of the platform but with some tweaks that improves your experience. This idea frequently comes up in a form of wishes by users saying “I wish Youtube allowed me to. [Read More]

The teaching/learning experience

Or my path to instructor certification

January 2017, I flew to Orlando to attend Rstudio::conf as my first R conference. I was lucky to have been granted a spot in the “Intermediate Shiny” workshop with Joe Cheng. Not only had I learned more about Shiny, but also about the structure of such intensive workshops and the way of teaching. I was observing these details closely, and I had a clear thought then; I would teach one of these workshops one day, whether in this conference or in other contexts. [Read More]

A Journey into a Team's Workflows

or how to navigate chaos and bring order to data projects!

“I thought my code was clear and organized, but I figured out it was not!”, that’s how a data analyst told me after a couple of sessions I held to examine the workflows of different team members and discuss their practices. I like to help people reach this realization to see the value of improving their workflows to them as well as to others with whom they collaborate. And when I say workflows I mean how to organize a project? [Read More]

From H2O to POJO Models

Getting started with a minimal example

A while ago, I was experimenting with h2o and wanted to generate a Plain Old Java Object (POJO) model. I found the documentation useful but I decided to write a post with a simple example for future reference. In this post, we will see how to: build a simple h2o model in R. convert the model to POJO. create a main program in Java to use the POJO model. compile and run the program. [Read More]

The Reporting/Dashboarding Dilemma!

Data scientists and dashboards: a complicated relationship

A couple of weeks ago I read a discussion on twitter initiated by a tweet from David Neuzerling who highlighted an observation about data science teams being pushed towards reporting/dashboarding inside organizations. Observation: any data science team will always face pressure from within an organisation to become a reporting/dashboarding team. — David Neuzerling (@mdneuzerling) June 19, 2019 I had some reflections from a previous experience and various discussions so I thought about gathering them in a blog post for future reference and further discussions. [Read More]

Teaching Shiny Workshop at SatRday Johannesburg

Eating the cake first and playing games!

This month three SatRdays took place on the same day around the world. I was invited to one of them, SatRday Johannesburg, to give a training and a talk at. I was excited about it because it was my first #rstats event in Africa and I wanted to contribute to events in the region where few conferences are held relative to Europe or the US. The workshop I gave was about Building Web Applications in Shiny. [Read More]

Collaborative Filtering Using Embeddings

goodreads recommendations

Everyday we deal with online platforms that use recommendation systems. There are different approaches to implement such systems and it depends on the product, the available data and more. This post will mainly focus on collaborative filtering using embeddings as a way to learn about latent factors. I learned about this approach months ago from fast.ai Practical Deep Learning for Coders lessons. I also found out that StichFix wrote about the same concept in a blog post last year Understanding Latent Style. [Read More]

Intro to Text Classification with Keras (Part 3 - CNN and RNN Layers)

In part 1 and part 2 of this series of posts on Text Classification in Keras we got a step by step intro about: processing text in Keras. embedding vectors as a way of representing words. defining a sequential models from scratch. Since we are working with a real dataset from the Toxic Comment Classification Challenge on Kaggle, we can always see how our models would score on the leaderboard if we competed with the final submissions. [Read More]

Divergent Bars in ggplot2

Step by step guide

A couple of days ago I was asked by one of the participants I met at the workshop I gave in DataFest Tbilisi about a simple tutorial on plotting divergent bars in ggplot2 and other bar chart stuff. I was going to write a gist with explanation but I decided to write this post instead to share with others whenever they ask during/after a workshop or in other occasions. In this post, I will give 2 step-by-step examples: [Read More]