HTML for non-web developers

skills
beginner
HTML
JSON
XML
Published

March 28, 2024

source(here::here("R/feed_block.R"))
feed_block("HTML for non-web developers")

No feedback found for this session

Why do this?

  • HTML is mainly used to write websites
  • So why bother with HTML if you don’t?

Swiss Army Knife

image: publicdomainvectors.org

Why is HTML useful?

  • Semantic markup
  • Say what an element is, rather than how it should look

…and not

Where might you use HTML?

(not surprising)

(possibly surprising)

Make some HTML

  • Create a .txt file in notepad

  • save as .html

  • add some text

  • open in your web browser

What does it look like?

Add some tags

<h1> Heading </h1>

  • tags = how we add information to the text
  • surrounded by <angle brackets>
  • mixture of single and paired tags

Starter single tags

  • <img src="URL">
  • </br> Line break
  • <hr> Horizontal rule

Starter paired tags

  • <h1>Heading</h1>
  • <a href="URL">Link</a>

Images (1)

<img src="https://i.imgur.com/OpmMr44.jpg" height="500" ALIGN="left" HSPACE="15" VSPACE="25"/>


(this goes a bit wonky in the quarto we use to build the training pages - definitely worth trying out yourself in your HTML file)










Images (2)

<img src="https://i.imgur.com/OpmMr44.jpg" height="300" ALIGN="right" HSPACE="15" VSPACE="55"/>


(this goes a bit wonky in the quarto we use to build the training pages - definitely worth trying out yourself in your HTML file)





Tables (1)

Webinar Date
Nov 2022 (R) 2022-11-16, 2-3pm
Dec 2022 (KIND conference) 2022-12-07, 1-5pm

Tables (2)

<table>
  <thead>
    <tr class="header">
      <th align="left">Webinar</th>
      <th align="left">Date</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td align="left">Nov 2022 (R)</td>
      <td align="left">2022-11-16, 2-3pm</td>
    </tr>
      <td align="left">Dec 2022 (KIND conference)</td>
      <td align="left">2022-12-07, 1-5pm</td>
    </tr>
  </tbody>
</table>

Don’t learn (many) tags

  • for web developers, yes, do learn them! (also all the css and java that are used to display and style the information in the html)
  • takeaway for everyone else - it’s being able to pick out the structure that’s useful
  • family of similar languages (XML, JSON) - store information and use it

Data munging

  1. (the hard bit) make/find an HTML version of your document
    1. Take a lot of links in a document
    2. Save the Word doc as html and tidy it up
  2. Open that webpage source using Notepad
  3. Paste html code into Excel
  4. Delete the columns you don’t need. (try F5, selecting ‘Special’, then ‘Blanks’, then Ctrl+-)
  5. Use Text to Columns tool to remove html bits (split on ")

parsing HTML

library(xml2)

"<p>This is some text. This is <b>bold!</b></p>" |>
  read_html() |>
  xml_text(trim = TRUE) 
[1] "This is some text. This is bold!"

parsing HTML

here::here("skills/data/doc_links.htm") |>
  xml2::read_html() |>
  xml2::xml_find_all("//@href") |>
  xml2::as_list() |>
  purrr::flatten() |>
  unlist() |>
  sample(5) # just a random sample of five to avoid overload
[1] "https://470.example.com" "https://990.example.com"
[3] "https://758.example.com" "https://264.example.com"
[5] "https://205.example.com"