In the previous post, we had an overview about text pre-processing in keras. In this post we will use a real dataset from the Toxic Comment Classification Challenge on Kaggle which solves a multi-label classification problem. In this competition, it was required to build a model that’s “capable of detecting different types of toxicity like threats, obscenity, insults, and identity-based hate”. The dataset includes thousands of comments from Wikipedia’s talk page edits and each comment can have more than one tag. [Read More]
Intro to Text Classification with Keras (Part 1)
pre-processing, embeddings and more
Keras provides a simple and flexible API to build and experiment with neural networks. I used it in both python and R, but I decided to write this post in R since there are less examples and tutorials. This series of posts will focus on text classification using keras. The introductory post will show a minimal example to explain: text pre-processing in keras. how and why to use embeddings. [Read More]
R-Ladies at DataFest Tbilisi
In November 2018, I attended DataFest Tbilisi 2018 as I was invited by R-Ladies Tbilisi to give a talk, a workshop and mentor participants in a Datathon. It was a great opportunity and I would particularly highlight the second and third day were we had R-Ladies Room for R lovers with a series of workshops and a Datathon organized and led by R-Ladies who were the main representatives of the R community there. [Read More]
Handling R errors the rlang way
Custom conditions, subclasses and more!
Every day we deal with errors, warnings and messages while writing, debugging or reviewing code. The three types belong to conditions in R. You might hope to see as few of them as possible, but actually they are so helpful when they describe the problem concisely and refer to its source. So if you write functions or code for yourself or others, it is a good practice to spend more time in writing descriptive conditions. [Read More]
Tidy Eval Meets ggplot2
The Bang Bang Plots
Almost a year ago I wrote about my My First Steps into The World of Tidy Eval. At the end I tweeted asking Hadley Wickham and Lionel Henrey whether ggplot2 was compatible with the tidy eval, They said that it was on the todo list. Finally, ggplot2 3.0.0 got released last week with the support of tidy eval, so I thought it was time to write about it! ggplot2 3. [Read More]
#runconf18: My First rOpenSci Unconf Experience
Last week I had the opportunity to attend rOpenSci #runconf18. It was a remarkable event, in which ~60 diverse people gathered to work on projects related to open data, package development, data visualization, reproducibility, education and more. But before talking about the unconf details, let me tell you my story with rOpenSci! I don’t remember exactly the first time I heard about rOpenSci, but I think it was around two years ago. [Read More]
yelpr Package for Yelp Fusion API
Or My Story with Package Development
Today I pushed a preliminary version of yelpr an R library for the Yelp Fusion API, and my first public package. I am still working on it and intended to share it with others to get feedback. But I also thought about writing a blog post to tell my story with package development in general, and my thought process while developing yelpr. What was my journey with package development? When I started to learn R, end of 2015, I was benefiting from the great amount of packages developed by “others”. [Read More]
What do you want to do with strings?
A couple of days ago, I passed by Sarah Drasner’s Array Explorer. It was through a retweet by Emily Robinson, who proposed the idea of a similar app for working with strings in R. I thought about giving it a try, and I deployed a preliminary Shiny App Stringr Explorer; which is still under development. In the following sections, I will give a brief about the data extracted from the package documentation to use in the Stringr Explorer app, and I’ll be glad to get better suggestions and contributions. [Read More]
Giving My First Data Science Talk
3 accepted abstracts after 3 rejected!
A couple of weeks ago, I gave my first talk in a data science/rstats conference; Fitting Humans Stories in List Columns: Cases from an Online Recruitment Platform. It was at EARL Boston and it was a good experience as I received positive feedback from the attendees. I intended to write about the story I shared in my talk. But then I recalled Emily Robinson’s post Giving Your First Data Science Talk, published last July. [Read More]
Adding Skimr Spark Histograms in Dataframe Columns
A couple of weeks ago, I was looking for a package, I previously passed by, that prints summary statistics with inline histograms. I checked all my bookmarks and liked tweets, but I couldn’t find it! So I asked on twitter. fortunately Maëlle Salmon read the tweet and guided me to skimr by ropenscilabs, who actually release many useful packages. In this post, I will focus on spark histograms in summary statistics and beyond. [Read More]