Extending the love story through design choices

You may have noticed by now that we at RStudio have been emphatic about the R & Python love story. This is driven by our efforts to unify data science teams and bridge the language divide. Our efforts are largely characterized by our development of the package {reticulate} and our professional product suite, RStudio Team.

The R & Python love story is based on the axiom that data scientists shouldn’t be forced to use a single tool. They should be able to use whichever tools that a) they prefer or b) gets the job done effectively. For the most part…

In my previous post I discussed how we can alter the R & Python story to be predicated on APIs as a way to bridge the language divide. The R & Python love story feels almost like unrequited love (h/t). Much of the development towards integrating the two languages has been heavily focused on the R user experience. While the developments with respect to reticulate have been enormous and cannot go understated, it might be worthwhile exploring another way in which R & Python and, for that matter, Python & R can be utilized together.

By shifting from language based…

or how to incorporate excel into a production API using plumber or a micro-excel-micro-service

Photo by Mika Baumeister on Unsplash

Originally published at http://josiahparry.com.

I recently had a conversation that touched on using to automate the parsing of Excel documents for administering data science assets. This brings up some very interesting points:

  1. Excel is sometimes unavoidable and we need to be okay with that.
  2. How can we incorporate Excel into production?

Note that this is no time to 💩 on Excel. It serves very real business purposes and unfortunately not everyone can learn to program 😕. Here’s a fun one for the h8rs: almost every presidential election campaign’s data program is based on the back of Google Sheets.

In this…

Origins and current perspective

Lately I have been developing a deep curiosity of the origins of the R language. I have since read a more from the WayBack Machine than a Master’s student probably should. There are four documents that I believe to be extremely foundational and most clearly outline the original philosophies underpinning both R and its predecessor S. These are Evolution of the S Language (Chambers, 1996), A Brief History of S (Becker), Stages in the Evolution of S (Chambers, 200), and R: Past and Future History by Ross Ihaka (1998). The readings have elicited many lines of thought and potential inquiry…

and {gargle} in general

This repository contains an example of an R Markdown document that uses
googlesheets4 to read from a private Google Sheet and is deployed to
RStudio Connect.

The path of least resistance for Google auth is to sit back and respond
to some interactive prompts, but this won’t work for something that is
deployed to a headless machine. You have to do some advance planning to
provide your deployed product with a token.

The gargle vignette non-interactive auth is the definitive document for how to do this. The gargle package handles auth for several packages, such as bigrquery, googledrive, gmailr, and…

Extracting and plotting feature importance

This post will go over extracting feature (variable) importance and creating a ggplot object for it. I will draw on the simplicity of Chris Albon’s post. For steps to do the following in Python, I recommend his post.

If you’ve ever created a decision tree, you’ve probably looked at measures of feature importance. In the above flashcard, impurity refers to how many times a feature was use and lead to a misclassification. Here, we’re looking at the importance of a feature, so how much it helped in the classification or prediction of an outcome.

This example will draw on the…

Making functions with methods in R.

Lately I have been doing more of my spatial analysis work in R with the help of the sf package. One shapefile I was working with had some horrendously named columns, and naturally, I tried to clean them using the clean_names() function from the janitor package. But lo, an egregious error occurred. To this end, I officially filed my complaint as an issue. The solution presented was to simply create a method for sf objects.

Yeah, methods, how tough can those be? Apparently the process isn’t at all difficult. But figuring out the process? That was difficult. This post will…

Before the United States created the Constitution, something called the Articles of Confederation defined what the US Government would look like. It was the first attempt at creating some sort of agreement between the 13 original states to form a central government. In the end, the Articles of Confederation made the new central government too weak to accomplish anything. Then, in 1787 representatives from each state met in Philadelphia to entirely scrap the Articles of Confederation in a meeting that became known as the Constitutional Convention. …

I have been living in the world of academia for nearly five years now. During this time I’ve read countless scholarly journal articles that I’ve struggled to wrap my head around. The academic language is riddled with obfuscating words like “milieux” and “nexus” which are often used to explain relatively simple concepts in a not so simple language. I’ve had to train myself to understand the academic language and translate it to regular people (layperson) speak.

The academic language is often used by the “elitist media” which has recently been blamed for creating a strong divide in American politics —…

A f’ing fun introduction to tidytext analysis with geniusR

My recent package geniusR was created with the idea of a tidytext analysis of song lyrics in mind. I now wish to introduce you to the concepts and application of tidytext analysis through the use of geniusR. If you would like an introduction to geniusR please read my Introduction to geniusR. Additionally, I recommend that you give Text Mining in R: A Tidy Approach by Julia Silge and David Robinson a read.

Initially I wanted to perform an exploratory text analysis of Kendrick Lamar’s recent album DAMN. (2017) and compare it to his older album Section.80 (2011). During my first…

Josiah Parry

Social Scientist meets Data Scientist. josiahparry.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store