https://onceupondata.com/ Recent content on Hugo -- gohugo.io Sun, 05 Jan 2025 00:00:00 +0000 A Winter Night in an AI Summer https://onceupondata.com/post/2025-01-05-philosophy-ai/ Sun, 05 Jan 2025 00:00:00 +0000 https://onceupondata.com/post/2025-01-05-philosophy-ai/ On a slightly rainy winter night in 2023 in a small dutch city, I went for a post- “AI & Philosophy Workshop” dinner at an Eritrean restaurant with a group of philosophers and scientists. We sat around the table to share food, thoughts, and some “food for thought”! Only two of the attendees, including me, were engineers from the industry and NOT from Europe/US or whatever the “Western World” means. This aspect became more noticeable when the discussions started and it stirred up many thoughts in my mind! About https://onceupondata.com/page/about/ Tue, 16 Apr 2024 00:00:00 +0000 https://onceupondata.com/page/about/ I am Principal Data Scientist (Machine Learning) with electronics engineering background. I like to engage with data, build data products and help others make data-informed decisions. I always ask they whys and the why nots and I encourage others to question their assumptions. I care about algorithmic fairness and I try to trigger others to think about it. I am a certified instructor from different entities including The Carpentries and I enjoy teaching coding and data-science related skills. About https://onceupondata.com/page/about/about/ Tue, 16 Apr 2024 00:00:00 +0000 https://onceupondata.com/page/about/about/ I am Principal Data Scientist (Machine Learning) with electronics engineering background. I like to engage with data, build data products and help others make data-informed decisions. I always ask they whys and the why nots and I encourage others to question their assumptions. I care about algorithmic fairness and I try to trigger others to think about it. I also enjoy teaching coding and data-science related skills. And I am a certified instructor from different entities including The Carpentries. On The Enshittification of Everything! https://onceupondata.com/post/2024-02-18-enshittification-of-everything/ Mon, 19 Feb 2024 00:00:00 +0000 https://onceupondata.com/post/2024-02-18-enshittification-of-everything/ “And each time we think we reached the peak of AI hype, the summit of bullshit mountain, we discover there’s worst to come” - Prof. Emily M. Bender A Day in The AI Summer It is one of the AI summer days, where almost every executive is hyped up about AI (regardless of what it means). I enter a hall full of people in the tech industry attending an event held by a mid-size tech company! I Am a Careless Person and AI Is Smarter Than Me! https://onceupondata.com/post/2023-06-17-i-am-a-careless-person-and-ai-is-smarter/ Sat, 17 Jun 2023 00:00:00 +0000 https://onceupondata.com/post/2023-06-17-i-am-a-careless-person-and-ai-is-smarter/ “I am a careless person,” he said “AI will help me write better feedback.”. “Isn’t it better to try to be thoughful and think before writing?” I said. “But AI can be my teacher. AI is much smarter” he said with absolute certainty. “This could be his self fulfilling prophecy!” I thought and shrugged, ending the conversation. ¯\(ツ)/¯. The “All You Need is AI” Mindset! The previous lines were part of a conversation during a hackathon presentation I recently attended. AI Hype-Driven Development - Parallels in History https://onceupondata.com/post/2023-04-07-ai-hype-large-language-models-simulmatics/ Fri, 07 Apr 2023 00:00:00 +0000 https://onceupondata.com/post/2023-04-07-ai-hype-large-language-models-simulmatics/ In twenty-first-century Silicon Valley, the meaninglessness of the past and the uselessness of history became articles of faith, gleefully performed arrogance. “The only thing that matters is the future,” said the Google and Uber self-driving car designer Anthony Levandowski in 2018. “I don’t even know why we study history. It’s entertaining, I guess—the dinosaurs and the Neanderthals and the Industrial Revolution and stuff like that. But what already happened doesn’t really matter. NOT Just Tweaking Some Parameters https://onceupondata.com/post/2022-04-17-deep-learning-practical-experience/ Sun, 17 Apr 2022 00:00:00 +0000 https://onceupondata.com/post/2022-04-17-deep-learning-practical-experience/ In the world of data science and machine learning, people tend to talk about the shiny stories, publishable methods, state of the art experiments, the rainbows and butterflies. In reality, practitioners struggle with lots of challenges while learning and applying new methods, but their practical lived experiences are rarely shared for different reasons. I was recently following one of the few examples, in which someone shared their journey to the world of deep learning, without a lot of filtering. The Seeds of Bad Data Products! https://onceupondata.com/post/how-do-harmful-algorithms-evolve/ Sun, 12 Sep 2021 00:00:00 +0000 https://onceupondata.com/post/how-do-harmful-algorithms-evolve/ How do harmful data products and algorithms get enforced on us and impact our lives? a question that is widely discussed nowadays, although I am not sure if “widely” is the right description of the current level of attention it takes. But there is another question that I find crucial which is, How do these products emerge in the first place and gain credibility?, How do crappy senseless products reach the market, proliferate our lives and put us in a reactive mode to analyze and prove their obvious worthlessness or harm? The Joy and Sorrow of the Data Science World https://onceupondata.com/post/joy-sorrow-data-science-world-stories/ Sat, 10 Oct 2020 00:00:00 +0000 https://onceupondata.com/post/joy-sorrow-data-science-world-stories/ Story # 1 Jasmin is a data scientist who works at a tech company in a product-oriented team. She likes to work on designing metrics and urges everyone in her team to reason about what they optimize for from a product perspective to link it to the models she builds. Everyone says that she asks good questions and she is tenacious. One day, a manager (non data science oriented) came to her to ask for some charts showing the “improvement” in the model/application performance and relevant metrics to use for a presentation in a company-wide event . Objective Function Engineering as an Interface Design Problem https://onceupondata.com/post/objective-function-engineering-interface-design/ Sun, 28 Jun 2020 00:00:00 +0000 https://onceupondata.com/post/objective-function-engineering-interface-design/ How many times have you been on a platform with an objective in mind that is a bit different than what the product optimizes for?. I am saying a bit different because I mean the case when you want to use the original features of the platform but with some tweaks that improves your experience. This idea frequently comes up in a form of wishes by users saying “I wish Youtube allowed me to. The teaching/learning experience https://onceupondata.com/post/teaching-rstudio/ Tue, 14 Apr 2020 00:00:00 +0000 https://onceupondata.com/post/teaching-rstudio/ January 2017, I flew to Orlando to attend Rstudio::conf as my first R conference. I was lucky to have been granted a spot in the “Intermediate Shiny” workshop with Joe Cheng. Not only had I learned more about Shiny, but also about the structure of such intensive workshops and the way of teaching. I was observing these details closely, and I had a clear thought then; I would teach one of these workshops one day, whether in this conference or in other contexts. A Journey into a Team's Workflows https://onceupondata.com/post/data-projects-workflows/ Sat, 18 Jan 2020 00:00:00 +0000 https://onceupondata.com/post/data-projects-workflows/ “I thought my code was clear and organized, but I figured out it was not!", that’s how a data analyst told me after a couple of sessions I held to examine the workflows of different team members and discuss their practices. I like to help people reach this realization to see the value of improving their workflows to them as well as to others with whom they collaborate. And when I say workflows I mean how to organize a project? From H2O to POJO Models https://onceupondata.com/post/h2o-pojo-models/ Sat, 13 Jul 2019 00:00:00 +0000 https://onceupondata.com/post/h2o-pojo-models/ A while ago, I was experimenting with h2o and wanted to generate a Plain Old Java Object (POJO) model. I found the documentation useful but I decided to write a post with a simple example for future reference. In this post, we will see how to: build a simple h2o model in R. convert the model to POJO. create a main program in Java to use the POJO model. The Reporting/Dashboarding Dilemma! https://onceupondata.com/post/dashboards-dilemma/ Wed, 03 Jul 2019 00:00:00 +0000 https://onceupondata.com/post/dashboards-dilemma/ A couple of weeks ago I read a discussion on twitter initiated by a tweet from David Neuzerling who highlighted an observation about data science teams being pushed towards reporting/dashboarding inside organizations. Observation: any data science team will always face pressure from within an organisation to become a reporting/dashboarding team. — David Neuzerling (@mdneuzerling) June 19, 2019 I had some reflections from a previous experience and various discussions so I thought about gathering them in a blog post for future reference and further discussions. Teaching Shiny Workshop at SatRday Johannesburg https://onceupondata.com/post/satrday-johannesburg/ Sat, 27 Apr 2019 00:00:00 +0000 https://onceupondata.com/post/satrday-johannesburg/ This month three SatRdays took place on the same day around the world. I was invited to one of them, SatRday Johannesburg, to give a training and a talk at. I was excited about it because it was my first #rstats event in Africa and I wanted to contribute to events in the region where few conferences are held relative to Europe or the US. The workshop I gave was about Building Web Applications in Shiny. Collaborative Filtering Using Embeddings https://onceupondata.com/post/nn-collaborative-filtering/ Sun, 10 Feb 2019 00:00:00 +0000 https://onceupondata.com/post/nn-collaborative-filtering/ Everyday we deal with online platforms that use recommendation systems. There are different approaches to implement such systems and it depends on the product, the available data and more. This post will mainly focus on collaborative filtering using embeddings as a way to learn about latent factors. I learned about this approach months ago from fast.ai Practical Deep Learning for Coders lessons. I also found out that StichFix wrote about the same concept in a blog post last year Understanding Latent Style. Intro to Text Classification with Keras (Part 3 - CNN and RNN Layers) https://onceupondata.com/post/keras-text3-cnn-rnn/ Fri, 01 Feb 2019 00:00:00 +0000 https://onceupondata.com/post/keras-text3-cnn-rnn/ In part 1 and part 2 of this series of posts on Text Classification in Keras we got a step by step intro about: processing text in Keras. embedding vectors as a way of representing words. defining a sequential models from scratch. Since we are working with a real dataset from the Toxic Comment Classification Challenge on Kaggle, we can always see how our models would score on the leaderboard if we competed with the final submissions. Divergent Bars in ggplot2 https://onceupondata.com/post/ggplot2-divergent-bars/ Fri, 25 Jan 2019 00:00:00 +0000 https://onceupondata.com/post/ggplot2-divergent-bars/ A couple of days ago I was asked by one of the participants I met at the workshop I gave in DataFest Tbilisi about a simple tutorial on plotting divergent bars in ggplot2 and other bar chart stuff. I was going to write a gist with explanation but I decided to write this post instead to share with others whenever they ask during/after a workshop or in other occasions. Intro to Text Classification with Keras (Part 2 - Multi-Label Classification) https://onceupondata.com/post/keras-text2-multilabel-classification/ Thu, 24 Jan 2019 00:00:00 +0000 https://onceupondata.com/post/keras-text2-multilabel-classification/ In the previous post, we had an overview about text pre-processing in keras. In this post we will use a real dataset from the Toxic Comment Classification Challenge on Kaggle which solves a multi-label classification problem. In this competition, it was required to build a model that’s “capable of detecting different types of toxicity like threats, obscenity, insults, and identity-based hate”. The dataset includes thousands of comments from Wikipedia’s talk page edits and each comment can have more than one tag. Intro to Text Classification with Keras (Part 1) https://onceupondata.com/post/keras-text-part1/ Mon, 21 Jan 2019 00:00:00 +0000 https://onceupondata.com/post/keras-text-part1/ Keras provides a simple and flexible API to build and experiment with neural networks. I used it in both python and R, but I decided to write this post in R since there are less examples and tutorials. This series of posts will focus on text classification using keras. The introductory post will show a minimal example to explain: text pre-processing in keras. how and why to use embeddings. R-Ladies at DataFest Tbilisi https://onceupondata.com/post/datafest-tbilisi/ Fri, 04 Jan 2019 00:00:00 +0000 https://onceupondata.com/post/datafest-tbilisi/ In November 2018, I attended DataFest Tbilisi 2018 as I was invited by R-Ladies Tbilisi to give a talk, a workshop and mentor participants in a Datathon. It was a great opportunity and I would particularly highlight the second and third day were we had R-Ladies Room for R lovers with a series of workshops and a Datathon organized and led by R-Ladies who were the main representatives of the R community there. Handling R errors the rlang way https://onceupondata.com/post/handling-r-errors/ Fri, 28 Sep 2018 00:00:00 +0000 https://onceupondata.com/post/handling-r-errors/ Every day we deal with errors, warnings and messages while writing, debugging or reviewing code. The three types belong to conditions in R. You might hope to see as few of them as possible, but actually they are so helpful when they describe the problem concisely and refer to its source. So if you write functions or code for yourself or others, it is a good practice to spend more time in writing descriptive conditions. Tidy Eval Meets ggplot2 https://onceupondata.com/post/ggplot-tidyeval/ Fri, 06 Jul 2018 00:00:00 +0000 https://onceupondata.com/post/ggplot-tidyeval/ Almost a year ago I wrote about my My First Steps into The World of Tidy Eval. At the end I tweeted asking Hadley Wickham and Lionel Henrey whether ggplot2 was compatible with the tidy eval, They said that it was on the todo list. Finally, ggplot2 3.0.0 got released last week with the support of tidy eval, so I thought it was time to write about it! ggplot2 3. #runconf18: My First rOpenSci Unconf Experience https://onceupondata.com/post/ropensci-runconf18/ Tue, 29 May 2018 00:00:00 +0000 https://onceupondata.com/post/ropensci-runconf18/ Last week I had the opportunity to attend rOpenSci #runconf18. It was a remarkable event, in which ~60 diverse people gathered to work on projects related to open data, package development, data visualization, reproducibility, education and more. But before talking about the unconf details, let me tell you my story with rOpenSci! I don’t remember exactly the first time I heard about rOpenSci, but I think it was around two years ago. yelpr Package for Yelp Fusion API https://onceupondata.com/post/yelpr/ Sun, 08 Apr 2018 00:00:00 +0000 https://onceupondata.com/post/yelpr/ Today I pushed a preliminary version of yelpr an R library for the Yelp Fusion API, and my first public package. I am still working on it and intended to share it with others to get feedback. But I also thought about writing a blog post to tell my story with package development in general, and my thought process while developing yelpr. What was my journey with package development? When I started to learn R, end of 2015, I was benefiting from the great amount of packages developed by “others”. Stringr Explorer https://onceupondata.com/post/stringr-explorer/ Sun, 31 Dec 2017 00:00:00 +0000 https://onceupondata.com/post/stringr-explorer/ A couple of days ago, I passed by Sarah Drasner’s Array Explorer. It was through a retweet by Emily Robinson, who proposed the idea of a similar app for working with strings in R. I thought about giving it a try, and I deployed a preliminary Shiny App Stringr Explorer; which is still under development. In the following sections, I will give a brief about the data extracted from the package documentation to use in the Stringr Explorer app, and I’ll be glad to get better suggestions and contributions. Giving My First Data Science Talk https://onceupondata.com/post/giving-my-first-conference-talk/ Fri, 17 Nov 2017 00:00:00 +0000 https://onceupondata.com/post/giving-my-first-conference-talk/ A couple of weeks ago, I gave my first talk in a data science/rstats conference; Fitting Humans Stories in List Columns: Cases from an Online Recruitment Platform. It was at EARL Boston and it was a good experience as I received positive feedback from the attendees. I intended to write about the story I shared in my talk. But then I recalled Emily Robinson’s post Giving Your First Data Science Talk, published last July. Adding Skimr Spark Histograms in Dataframe Columns https://onceupondata.com/post/spark-histograms/ Fri, 15 Sep 2017 00:00:00 +0000 https://onceupondata.com/post/spark-histograms/ A couple of weeks ago, I was looking for a package, I previously passed by, that prints summary statistics with inline histograms. I checked all my bookmarks and liked tweets, but I couldn’t find it! So I asked on twitter. fortunately Maëlle Salmon read the tweet and guided me to skimr by ropenscilabs, who actually release many useful packages. In this post, I will focus on spark histograms in summary statistics and beyond. My First Steps into The World of Tidy Eval https://onceupondata.com/post/my-first-steps-into-the-world-of-tidyeval/ Sat, 12 Aug 2017 00:00:00 +0000 https://onceupondata.com/post/my-first-steps-into-the-world-of-tidyeval/ A couple of months ago, Tidy eval was something that I passed by, but didn’t have time to explore. As usual, sometimes one gets busy with the daily work, and puts some stuff aside to come back to. However, I like to find ways that give me a higher level of flexibility and more control. So mid June, I had an inquiry regarding programming around dplyr. I wasn’t sure how to pass a variable column names to purrr::map, so I opened an issue; purrr::map() support for SE/variable column names? Highlights from UseR! 2017 https://onceupondata.com/post/user-2017/ Wed, 12 Jul 2017 00:00:00 +0000 https://onceupondata.com/post/user-2017/ In the first week of July, the 14th UseR! conference took place in Brussels as the biggest UseR!. For me, it was the first UseR! and I believe it was a good opportunity to get exposed to different approaches in the data world, see different applications, learn about new packages and meet people in the R community, all in one place. There were lots of interesting things to be highlighted. However, in this post, I will just refer to some talks about teaching R, the journey of package development and contributing to the R community. R Questions Tag Pairs on Stackoverflow https://onceupondata.com/post/2017-5-6-r_q_tags/ Sat, 06 May 2017 00:00:00 +0000 https://onceupondata.com/post/2017-5-6-r_q_tags/ Months ago, I passed by R Questions from Stack Overflow published on Kaggle. I was interested in tag pairs in particular, i.e. which tags appear together in R questions, so I worked on this simple kernel. This week, I had some time so I thought about deploying a simple Shiny App, to give more people access to exploring the tag pairs. So here is the App, where you can see the most frequent tags that appear with a certain tag. Prophet Explore: A Simple Shiny App to Get Introduced to Prophet https://onceupondata.com/post/2017-4-17-prophet_explore/ Mon, 17 Apr 2017 00:00:00 +0000 https://onceupondata.com/post/2017-4-17-prophet_explore/ Last February, I read about prophet package, which was released by Facebook’s Core Data Science team. I skimmed the published paper Forecasting at Scale quickly and I got the main concept. I also liked how the creators; Sean Taylor and Ben Letham were trying to empower analysts to produce high quality forecasts by offering them a flexible and configurable model that requires general understanding, but not necessarily deep knowledge about time series models. A Glimpse into The Daily Life of a Data Scientist https://onceupondata.com/post/2017-1-25-data_science_life/ Wed, 25 Jan 2017 00:00:00 +0000 https://onceupondata.com/post/2017-1-25-data_science_life/ A couple of weeks ago, I had a discussion with a co-worker regarding a project I was involved in, I felt that there was no clear understanding of the daily challenges data scientists face. Few days later, I was at Rstudio::Conf 2017 where I met lots of data scientists from academia and industry. Later on, I described one of the conference’s positive side effects as “group therapy”, where one could see how others face the same challenges and struggle with similar issues. Yet Another Post on Logistic Regression https://onceupondata.com/post/2016-7-22-logistic_regression/ Fri, 22 Jul 2016 00:00:00 +0000 https://onceupondata.com/post/2016-7-22-logistic_regression/ Everyday statisticians, analysts and data enthusiasts perform data analysis for different purposes. But when it comes to presenting analyses to wider audience, the good work is not the complex one with big words. It is the one that highlights interesting relations, answers business questions or predict outcomes, and explain all that in the simplest way through data visualization or simple concepts. So if one throws numbers, model coefficients and complex graphs to impress the audience, it might fireback if the audience are not familiar with a certain concept. The Power of (purrr, tidy, broom)-Exploring Climate Change Trends https://onceupondata.com/post/2016-06-24-climate_change_analysis/ Fri, 24 Jun 2016 00:00:00 +0000 https://onceupondata.com/post/2016-06-24-climate_change_analysis/ Few days ago, I wanted to explore the Climate Change: Earth Surface Temperature Data dataset published on Kaggle and originally compiled by Berkeley Earth. The dataset is relatively large as it contains entries from 1750-2014! This was shortly after watching Hadley Wickham’s talk about managing many models with R. So I thought about using the power of purrr,tidy and broom to handle the climate change dataset and I decided to focus on the change in the average temprature in the 100 pre-selected major citis in the dataset. Lessons Learnt About Data Viz - Why a Boxplot Is Sometimes The Worst Choice? https://onceupondata.com/post/2016-6-23-dataviz_boxplot_lessons_learnt/ Thu, 23 Jun 2016 00:00:00 +0000 https://onceupondata.com/post/2016-6-23-dataviz_boxplot_lessons_learnt/ Data visualization is a means of visual communication that should help people understand the significance of data easily and see interesting trends, patterns, distributions,..etc. If your audience fails to grasp the message that was intended to be conveyed by the graph, they are not to be blamed. You are! or to be precise, your choice of the graphical representation of the data. I knew all that, and I used to spend time thinking about the best chart to convey a certain message or to highlight an interesting behavior. R googleVis Line Motion Charts with Modified Options https://onceupondata.com/post/2016-6-22-fish_catch_expanalysis/ Wed, 22 Jun 2016 00:00:00 +0000 https://onceupondata.com/post/2016-6-22-fish_catch_expanalysis/ Using googleVis via R provides lots of options to create nice google visualizations. I was trying to create some charts while exploring the Annual Nominal Fish Catches Data on Kaggle. I wanted to create a line motion chart and exclude the default bubble chart. So I played with the options to get the desired result. The following is a quick explanation of how to do that. Fish Catches Dataset The dataset provides the annual TLW (tonnes live weight) catches of fish and shellfish in the Northeast Atlantic region. Leverage and Influence in a Nutshell https://onceupondata.com/post/2016-6-16-influenceanalysis/ Thu, 16 Jun 2016 00:00:00 +0000 https://onceupondata.com/post/2016-6-16-influenceanalysis/ In regression models, we frequently face the situation where we need to look at outliers and influential observations. We know that a common practice is to perform diagnostics checks to dig deeper and see how different points affect the fitted model or its coeffecients. In this post, we will focus on two concepts (leverage and influence), but we will not dig deep into the math behind them. We will try to visualize and catch the intuition behind them first. A shout Out to R bloggers https://onceupondata.com/post/2016-6-15-rbloggers/ Wed, 15 Jun 2016 00:00:00 +0000 https://onceupondata.com/post/2016-6-15-rbloggers/ Since I started to work with R, I became a frequent visitor to R-bloggers web site where I find a variety of helpful tips and tutorials. Now, as I started my own blog, it is time to give a shout-out to them!