HTTP header security tweaks
25 July 2024 | 2:44 am

A Mastodon note last night (I still can’t bring myself to say “toot”) from @teledyn@mstdn.ca pointed out the HTTP Observatory, which “enhances web security by analyzing compliance with best security practices.” You put in a URL and it gives you a score on the site’s security, with tips on how to make it better.

I put in this site and got 65/100. With a bit of work I got to 115/100! I was pretty pleased about that. Then I discovered I had broken some small things because the settings were too restrictive. I fixed that and now I get 125/100!

Here are some notes about that, in case it’s useful to me in the future or anyone else before then. Note: I am no expert, I’m not even a web developer any more. I’m just a person running a static web site built with Jekyll who hacked on some HTTP headers.

Headers

Header information is normally used by your browser to understand what it should do with the web page content that follows. It’s rarely of interest to people unless they’re doing something technical. If you run curl --head https://www.miskatonic.org you can see only the headers for this site, which look like this:

HTTP/1.1 200 OK
Date: Wed, 24 Jul 2024 21:04:27 GMT
Server: Apache
Last-Modified: Wed, 24 Jul 2024 20:18:14 GMT
ETag: "ca2e-61e03fa641ef7"
Accept-Ranges: bytes
Content-Length: 51758
X-Clacks-Overhead: GNU Terry Pratchett
Content-Security-Policy: default-src 'self'; style-src 'unsafe-inline' 'self'; font-src 'self'; media-src 'self' data: 'self'; frame-ancestors 'none'
X-Frame-Options: DENY
Referrer-Policy: no-referrer
X-Content-Type-Options: nosniff
Cross-Origin-Resource-Policy: same-origin
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Content-Type: text/html

(No cookies are set because there are no cookies. There is also no logging. I know nothing about anyone who looks at this site.)

Everything up to Content-Length is generated by the web server, which here is Apache hosted on Pair. Everything after that is set by me. Here’s a rundown. Browse Practical security implementation guides for more about all this.

X-Clacks-Overhead

Header set X-Clacks-Overhead "GNU Terry Pratchett"

(This is how I set the header in the .htaccess file where I can configure things. You might set it differently in your situation.)

This remembers Terry Pratchett. See X-Clacks-Overhead. This has nothing to do with security; I just think Terry Pratchett is one of the greatest writers of the last fifty years.

Content-Security-Policy

Header set Content-Security-Policy "default-src 'self'; style-src 'unsafe-inline' 'self'; font-src 'self'; media-src 'self' data: 'self'; frame-src 'self' https://www.youtube-nocookie.com/; frame-ancestors 'none'"

This was the trickiest to get working properly. See Content-Security-Policy and Content Security Policy (CSP) implementation on MDN (which I think of as the Mozilla Developer Network) and this Content Security Policy Reference for more. This is to prevent cross-site scripting attacks.

Now, I don’t have any JavaScript running on my site, so it’s not going to be a problem. Well, I don’t have any of my own JavaScript, but I do sometimes embed a YouTube video, such as in this post about Molly White’s Become a Wikipedian in 30 Minutes video. To make that work I have to have that special frame-src configuration.

As I understand it, default-src being ‘self’ means that by default, while looking at this site the browser should only load resources from this site, unless otherwise specified. Having style-src as ‘unsafe-inline’ means I can set CSS inline, specifying font-size: smaller right in a web page. It should be in a CSS file, and maybe I’ll fix that one day, but I’m not going to fuss about it right now. In media-src there’s an extra data: that says it’s all right to load media (such as images) that are specified right in the HTML with a data URL. I had to add this to do away with a warning, which confused me because I never load any media this way. Maybe I’ll figure it out later.

Using Firefox’s developer tools was a huge help in figuring all this out. I’d load the site, hit Ctrl-Shift I to pop it up, go to the Console, and reload over and over as I tweaked settings. (Whether or not JavaScript was enabled changed things, so test with it off and on. Privacy settings and add-ons may also affect things.)

This isn’t as tightly locked down as it could be, but it’s much better than it was before.

X-Frame-Options

Header set X-Frame-Options DENY

X-Frame-Options helps prevent clickjacking, as does the frame-ancestors setting above. This option is obsolete, but so what.

Referrer-Policy

Header set Referrer-Policy no-referrer

With this Referrer-Policy setting, if someone follows a link from my site to another there is no header passed along telling the destination site that they came from here. This could be needed in some situations, but not here.

X-Content-Type-Options

Header set X-Content-Type-Options nosniff

X-Content-Type-Options tells the browser “not to load scripts and stylesheets unless the server indicates the correct MIME type.” It’s nosniff because it stops the browser from content sniffing.

Cross-Origin-Resource-Policy

Header set Cross-Origin-Resource-Policy same-origin

Cross-Origin-Resource-Policy “lets websites and applications opt-in to protection against vulnerabilities related to certain cross-origin requests.” Setting it to same-origin means “limits resource access to requests coming from the same origin. This is recommended for URLs that reply with sensitive user information or private APIs.” I’m not doing anything that even needs to worry about this, so I set it to the most secure option, because it will never matter.

Strict-Transport-Security

Header add Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"

Strict-Transport-Security is an important one: it says that the site should only ever be accessed with HTTPS. The age setting is in seconds, and it equals one year. I don’t know if it matters any more, but you can add your domain to the HSTS Preload List as well.

Other sites

It’s fun to check how other sites rank at the HTTP Observatory. Right now proton.me gets 75/100 (!), gmail.com gets 105/100, wordpress.com gets 25/100, cbc.ca gets 5/100, and York University Libraries, where I work, gets 0/100. And if you can run curl at a shell, try looking at the headers of some sites you visit.


Using Llama 3 for scripting help with curl and long podcast URLs
10 July 2024 | 11:11 pm

Someone, I think it was Ed Summers, mentioned a podcast I didn’t know, specifically Episode #202: Why we can’t think beyond capitalism. - Neoliberalism (Mark Fisher, Capitalist Realism). I didn’t know anything about it, but there was an RSS feed (as there is for all true podcasts) so I had a look at that. Podcast RSS feeds are clear and easy ways to see what the show is like, without a lot of cruft. I thought I’d download episode 202 and give it a listen.

This is the URL for episode 202:

https://pdst.fm/e/chrt.fm/track/47G541/pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748

Wow! That looks very long and rather suspicious. The URL has a lot of domain names (such as chrt.fm) in it. What’s going on? The best tool to find out is curl. Here’s what I ran at the command line, with the output:

$ curl --head "https://pdst.fm/e/chrt.fm/track/47G541/pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748"
HTTP/2 302
date: Wed, 10 Jul 2024 23:24:45 GMT
content-type: text/html; charset=utf-8
access-control-allow-origin: *
location: https://chrt.fm/track/47G541/pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748
via: 1.1 google

The --head option has curl report the HTTP header (mostly metadata) instead of downloading anything. Notice the HTTP 302 response and the location line. That tells the client (browser or podcast program or whatever) that it should look to this URL to get what it wants. But if we get that URL …

$ curl --head "https://chrt.fm/track/47G541/pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748"
HTTP/2 302
content-length: 0
location: https://pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748
date: Wed, 10 Jul 2024 23:28:24 GMT
server: uvicorn
set-cookie: _chtbl=a225a87f117649acb2171c1dd6a37ffb; Domain=.chrt.fm; Path=/; SameSite=lax
x-cache: Miss from cloudfront
via: 1.1 4c6036e1a9755ebb992fa03bf694150e.cloudfront.net (CloudFront)
x-amz-cf-pop: YUL62-P2
x-amz-cf-id: qy9MNGbTQ1PvhMtqgaNesEm86ukfgOpUfy7CJ1FmX8XKInKY9bxv3g==

Another location header, still containing a lot of things that look like domain names in it. It’s going to take a while to get to the bottom of this.

I wrote myself a little function (for bash, the shell I use), so I can supply a URL and get the location header back.

$ function get_location() {  curl --silent --head "$1" | grep location | sed 's/.* //'; }
$ get_location "https://pdst.fm/e/chrt.fm/track/47G541/pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748"
https://chrt.fm/track/47G541/pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748

Nice!

Next I wanted to write a short script that would work through all these URLs until it got to the bottom. I spent five or ten minutes on this but ran into some kind of problem with checking if a string was empty or something like that. I thought, “Wait a minute. Simon Willison would use an LLM to do it for him!” So that’s what I tried, with DuckDuckGo’s Llama 3.

I told it:

Write a bash shell function that uses curl to find if a given URL has a location header, and if so, return that location URL.

It gave me:

get_location_header() {
  local url="$1"
  local location=$(curl -s -I -o /dev/null -w "%{http_code} %{redirect_url}" "$url")
  local code=$(echo "$location" | cut -d' ' -f1)
  local redirect_url=$(echo "$location" | cut -d' ' -f2-)

  if [ "$code" -eq 301 -o "$code" -eq 302 ]; then
    echo "$redirect_url"
  else
    echo "No location header found"
  fi
}

I said:

It is possible that running that function on a URL returns a URL that itself returns a location header. Use that function to write a recursive shell script that works through multiple levels of location headers until there are no more.

It gave me:

follow_redirects() {
  local url="$1"
  local max_redirects=10  # prevent infinite loops

  while [ $max_redirects -gt 0 ]; do
    local location=$(get_location_header "$url")
    if [ "$location" != "No location header found" ]; then
      url="$location"
      max_redirects=$((max_redirects - 1))
    else
      echo "Final URL: $url"
      return
    fi
  done

  echo "Error: too many redirects"
}

I tried that and it worked, but not how I wanted. I said:

That works, but make it show each URL as it recurses.

follow_redirects() {
  local url="$1"
  local max_redirects=10  # prevent infinite loops

  while [ $max_redirects -gt 0 ]; do
    echo "Checking URL: $url"
    local location=$(get_location_header "$url")
    if [ "$location" != "No location header found" ]; then
      url="$location"
      max_redirects=$((max_redirects - 1))
    else
      echo "Final URL: $url"
      return
    fi
  done

  echo "Error: too many redirects"
}

I said:

That works nicely.

I'm glad to hear that! The `follow_redirects` function should be helpful for tracing the chain of redirects for a given URL. If you have any more questions or need further assistance, feel free to ask!

Here’s how to run it all in a bash shell:

$ function get_location_header() {
  local url="$1"
  local location=$(curl -s -I -o /dev/null -w "%{http_code} %{redirect_url}" "$url")
  local code=$(echo "$location" | cut -d' ' -f1)
  local redirect_url=$(echo "$location" | cut -d' ' -f2-)

  if [ "$code" -eq 301 -o "$code" -eq 302 ]; then
    echo "$redirect_url"
  else
    echo "No location header found"
  fi
}
$ follow_redirects() {
  local url="$1"
  local max_redirects=10  # prevent infinite loops

  while [ $max_redirects -gt 0 ]; do
    echo "Checking URL: $url"
    local location=$(get_location_header "$url")
    if [ "$location" != "No location header found" ]; then
      url="$location"
      max_redirects=$((max_redirects - 1))
    else
      echo "Final URL: $url"
      return
    fi
  done

  echo "Error: too many redirects"
}
$ follow_redirects "https://pdst.fm/e/chrt.fm/track/47G541/pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748"
Checking URL: https://pdst.fm/e/chrt.fm/track/47G541/pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748
Checking URL: https://chrt.fm/track/47G541/pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748
Checking URL: https://pscrb.fm/rss/p/mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748
Checking URL: https://mgln.ai/e/433/claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748
Checking URL: https://claritaspod.com/measure/traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748
Checking URL: https://traffic.megaphone.fm/QCD6489207092.mp3?updated=1717391748
Checking URL: https://dcs.megaphone.fm/QCD6489207092.mp3?key=8cd61d704fe268b21da14bc81f17f7ba&request_event_id=9706319f-431f-465d-a85d-d55053cbb184&timetoken=1720741181_4922C6F958D58D4C0673A8DE2476B183
Final URL: https://dcs.megaphone.fm/QCD6489207092.mp3?key=8cd61d704fe268b21da14bc81f17f7ba&request_event_id=9706319f-431f-465d-a85d-d55053cbb184&timetoken=1720741181_4922C6F958D58D4C0673A8DE2476B183

I knew I had to use function to make bash know these were functions, but I could have told Llama to give me something I could copy and paste right into the shell. Also, it’s using short options for curl (such as -I) where I prefer long ones (such as --head) in anything scripted, because it helps readability. I could have told it to change that as well.

It works very nicely, and what’s more, the scripts are good. Better than anything I could hack in a few minutes, and I’ve never used local in a shell script in my life, but it seems like a good practice.

After all that, the final URL is

https://dcs.megaphone.fm/QCD6489207092.mp3?key=8cd61d704fe268b21da14bc81f17f7ba&request_event_id=9706319f-431f-465d-a85d-d55053cbb184&timetoken=1720741181_4922C6F958D58D4C0673A8DE2476B183

If you strip off the tracking, it’s

https://dcs.megaphone.fm/QCD6489207092.mp3

Now I can feed that into my audio player and finally listen to episode 202 of Philosophize This! I see it’s thirty-seven minutes long, which is shorter than it took me to write this.


Making the Firefox scrollbar bigger
10 July 2024 | 12:19 am

The scrollbar in Firefox disappears when the pointer isn’t moving. For example, the top right of this site looks like this.

Part of a web page; no scrollbar
Part of a web page; no scrollbar

If I touch the touchpad, it appears and looks like this. It’s very small.

Part of a web page; tiny scrollbar
Part of a web page; tiny scrollbar

Thanks to this note by @mgorny@treehouse.systems I learned how to make it better. Go to about:config (and agree to the danger) then set widget.gtk.overlay-scrollbars.enabled to false and widget.non-native-theme.scrollbar.size.override to 24 or so. The scrollbar becomes permanent and clearly visible. I like this for a several reasons, a main one being that it’s always obvious how long a page is and where I am in it.

Part of a web page; nice big scrollbar
Part of a web page; nice big scrollbar

That’s far more technical than it should be to tweak this, but Firefox has bigger problems to fix (such as not getting bogged down in AI nonsense).



More News from this Feed See Full Web Site