Analyst, analyse thyself

I work in a small (growing) team of data scientists and we’re part of a larger analytic unit which has a focus on innovation in analytics (hence us) and public health methods. We have a pretty broad remit to do interesting and useful stuff and we focus on things that we think are useful (for example, applying data science methods to problems relating to equity of access to services) and things that people ask us to help with.

Shiny modules for beginners

I’ve seen some discussion on the Internet about whether people learning Shiny should start with modules or if they’re an advanced topic. I’m not going to link to any examples because this post is definitely not me saying- hey look at this rubbish over here, here’s why it’s wrong. I’ve seen a couple of posts talking about it. But I just thought I would chip in my perspective. I really love modules and they solve a lot of problems, and I think they could be a great thing for a beginner to learn.

Getting started in data science for healthcare

I just recruited to a data science post at Nottinghamshire Healthcare NHS Trust and I’ve been asked for feedback by several people about how to get started in data science in healthcare, and indeed how to get a job in data science in healthcare. I will also be recruiting another data scientist for our team in a few months. This post is designed to give you a feel for what we were looking for in the job that I just recruited to, as well as to the job that is coming up.

RStudio Connect behind the firewall

This is part II of what would otherwise have been a far-too-long post about configuring RStudio Connect. A bit of back story, particularly for those of you who might have hit this from a Google search (which does happen, JetPack tells me) and don’t know who the heck I am and what I do all day. Here’s what I said in part I: I’ve been using RStudio stuff on the server for a long time.

RStudio Connect in the cloud

I’ve been using RStudio stuff on the server for a long time. I started using Shiny community edition back in 2013 for an application that is totally open and so doesn’t need authenticating. Then two years ago I started deploying Shiny applications that people authenticated to behind our Trust firewall using Shiny Pro. I have wanted to use RStudio Connect for a long time but it was hard to get the funding together for it given how things are with austerity since the banking crisis.

Productionising R at Nottinghamshire Healthcare

I’m hopeful we’re moving into a bit of a new phase with using R in my Trust so I thought I’d outline the direction of travel, to see if it chimes with anyone else and just to keep people up to date about what we’re doing. We’ve used Shiny for some years now, maybe 7, and we have applications behind the firewall and in the cloud which are well used by the staff who need them.

app.R and global.R

I’m doing some Shiny training this year and I want to teach whatever the new thinking is so I’ve been reading Hadley Wickham’s online book Mastering Shiny. There’s a couple of things that I’ve noticed where Shiny is moving on, so if you want to keep up to date I suggest you have a look. I’m going to pick out a few here. Firstly, note that in Shiny 1.5 (which is not released at the time of writing) all code in the R/ directory will be sourced automatically.

Data science for human beings

Someone just emailed me to ask me about getting into data science. They knew all the usual stuff, linear algebra, Python, all that stuff, so I thought I’d talk about the other side of data science. It’s all stuff I say whenever I talk about data science, but I’ve never written it down so I thought I may as well blog it. There are three things that are probably harder to learn that will make you stand out at interview and be a better data scientist.

NHS data science and software licensing

I’m writing something about software licensing and IP in NHS data science projects at the moment. I don’t think I ever dreamed about doing this, but I’ve noticed that a lot of people working in data science and related fields are confused about some of the issues and I would like to produce a set of facts (and opinions) which are based on a thorough reading of the subject and share them with interested parties.

The Great Survey Munge

As I mentioned on Twitter the other day, I have this rather ugly spreadsheet that comes from some online survey software that requires quite a lot of cleaning in order to upload it to the database. I had an old version written in base R but the survey has changed so I’ve updated it to tidyverse. And this is where tidyverse absolutely shines. Using it for this job really made me realise how much help it gives you when you’ve got a big mess and you want to rapidly turn it into a proper dataset, renaming, recoding, and generally cleaning.