Scraping the RStudio webinar list

I only just found this list of RStudio webinars, there’s loads of stuff on there, I really need to plow through a lot of it. What I really wanted was a list of them with links that I could archive and edit and rearrange so I could show which ones I am interested in, which I’ve already watched, and so on.

Well, if you’ve got a problem, and no-one else can help, then maybe you need… The R Team.

Oh yes, and you’ll also need selector gadget, which is described brilliantly in this selector gadget vignette.

Once you’ve got all that, the code writes itself. The only wrinkle I ironed out was that some of the HTML paths were relative, not absolute, so I paste http://blah on the front of those ones, as you’ll see.

<pre class="brush: r; title: ; notranslate" title="">

library(rvest)

rstudio = read_html("https://www.rstudio.com/resources/webinars/")

linkText = rstudio %>%
  html_nodes('.toggle-content a') %>%
  html_text()

linkURL = rstudio %>%
  html_nodes(".toggle-content a") %>%
  html_attr("href")

linkURL[substr(linkURL, 1, 4) != "http"] = 
  paste0("https://www.rstudio.com", 
         linkURL[substr(linkURL, 1, 4) != "http"])

cat(paste0("<a href = ", linkURL, ">", 
           linkText, "</a><br>"), file = "webinar.html")

Done! Now all I did was open the resulting file and paste it into Evernote, which kept the links and text together, as you’d expect, and I can now cut and paste and markup to my heart’s desire.

I love it when a plan comes together.