Getting started in data science for healthcare
By chrisbeeley
I just recruited to a data science post at Nottinghamshire Healthcare NHS Trust and I’ve been asked for feedback by several people about how to get started in data science in healthcare, and indeed how to get a job in data science in healthcare. I will also be recruiting another data scientist for our team in a few months. This post is designed to give you a feel for what we were looking for in the job that I just recruited to, as well as to the job that is coming up. They’re slightly different skill sets as I will discuss.
I should say at the top that I am pretty much a person on the Internet, there are people much better qualified than I am to answer this question, and there are much better jobs out there than the ones I have advertised and will advertise. My trust is a great place to work, and we have a lot of freedom to innovate, and we are big believers in using and publishing open source code. I hope we’re a friendly and supportive team too. But I’m sure there are many out there that do all this and more, and better. With all that said, let’s discuss what I was looking for last time, what I’m looking for next time, and what you can do to get the skills for these or any other similar jobs.
The recent post I advertised was for a band 7 data scientist, one year contract, starting salary of £38K. It’s for a funded piece of work designing algorithms that can read patient feedback and tell you: what it’s about (parking, food, communication…), how positive or negative it is (you saved my life, this service is a disgrace, the ward was dirty, the food was cold…) and making a dashboard so people can interrogate their data. I don’t want to say too much about it here, otherwise this post will get way too long.
The first thing to say is that the field was VERY strong. I didn’t think we could get quite as much interest from quite as many talented people as we did. We were definitely spoiled for choice but the areas that ended up being very important were.
- Machine learning. Because it’s a one year project, with a specific goal, we were looking for someone who could just run with a machine learning project and could build a pipeline with real data
- Team working. The ability to help more junior staff, and to work with non technical staff is absolutely vital in such a small team. We were looking for people skills as well as “technical” team working skills- version control, package management, documentation. Avoiding “it works on my machine”, basically
- Communication. Some people are completely on board with data science and trust experts to just get on with it and can engage them about what they want and ask the right questions. And some people will sit with their arms folded (literally or figuratively) and wait to be convinced. And a strong organisation needs both. Cheerleaders and critics. And we were looking for someone who could navigate a meeting with a healthy dose of each, as well as someone who could engage the straight up disengaged
If you want to get good at this stuff, there are lots of things you can do. This book is excellent if you want to go the Python route. Kaggle is a great resource for datasets and how-to’s. Try to get some real data in your workplace and do some ML with it. Be a mentee. Be a mentor. Work with your team to get better practice around version control and code style. Sell it to your bosses. Even if they’re idiots and don’t listen, it’s all experience. Come to the interview and say that. “I identified several areas of data and analytic practice that could be improved by x, y, and z. I wrote them a five point plan explaining the training their staff would need, the likely benefits and pitfalls. They are idiots and didn’t listen.”
If you already work in the public sector I highly recommend the data science accelerator. Twelve days, one a week, with a data science mentor. I was a mentee a year and a half ago. Then I was a mentor. Doing one or both is a really good way of getting out of your silo, seeing new datasets and challenges, and learning to ask and answer the right questions. Besides my PhD it’s the best thing I ever did, career wise.
The next job is going to be a bit more general purpose. The team has quite a few pieces of smaller paid work coming in and I need somebody who can help out as well as work in the Trust building on the work we’re already doing: statistics, machine learning, and publishing either as a document or a Shiny powered dashboard. We’re looking for similar stuff as last time, but not as focused on machine learning, perhaps someone with some stats knowledge- regression, experimental design, measures of association, all that stuff. I feel like R would be a big advantage, but if somebody can do all that in Python then that’s good too, and ditto other things like Julia. They would probably end up learning some R just to read the stuff we’re all writing, and we’d all end up learning Python/ Julia just to read their stuff. All to the good.
Again, we need someone who can work in a team (no hero coders) and communicate at all levels of the Trust and beyond. We need someone who can teach, and learn, and someone who can convince the sceptical and engage the disengaged. Something I would absolutely love to be able to recruit to is Linux server maintenance experience. I’m in charge of two Linux servers that do our work for us, one in the cloud, one behind the firewall (see this blog, passim) and I would love to be able to give the keys to somebody who knows what they’re doing. Even more, I’d love for them to do the next upgrade. Going to RStudio Connect with Ubuntu 20.04 was hideous (not the fault of RStudio or Ubuntu, obviously, purely my fault 😀) and having someone be able to worry about that would have been wonderful.
Some people have told me I’m wasting my time managing my own servers as a data scientist and I really need to get proper IT support, and maybe they’re right, but I’ve come this far, even if I am doing it wrong. I’ve been learning Linux server stuff on my own server for 7 years. If you want to learn this way without buying your own cloud server then just buy a Raspberry Pi and ssh into it from behind your firewall at home. Set up a LAMP stack, set up WordPress, set up the free versions of RStudio Server and Shiny Server, run plumber APIs, run PHP, write a Django application, whatever. You’ll get in the most hideous mess and tear your hair out for entire weekends and you’ll look back at those weekends fondly and be glad of all the learning you did that day 😀. And if you get in a complete mess just wipe the SD card and start again.
As I say, this is all caveat emptor. Your mileage may vary. But if you want my opinions then these are they.