Using the Dark Visitors API with Hugo to opt out of AI data harvesting
15 April 2024 | 9:58 pm

Dark Visitors recently published an API to grab updated robots.txt files from. After some stumbles and repeated hair-pulling at Hugo’s lovely template language, I created a Hugo module to work with the API.

You can peruse the many curly braces within the module by viewing it on GitHub. Stay tuned though, because I’m about to step through it.

dark-visitors.html

{{- $url := "https://api.darkvisitors.com/robots-txts" -}}
{{- $api_key := getenv "HUGO_DARKVISITORS" -}}
{{- $bearer := printf "Bearer %v" $api_key -}}
{{- $agent_types := slice -}}
{{- if .Site.Params.darkVisitors -}}
	{{- range .Site.Params.darkVisitors -}}
		{{- $agent_types = $agent_types | append . -}}
	{{- end -}}
{{- else -}}
	{{- $agent_types = slice "AI Data Scraper" -}}
{{- end -}}
{{- $agent_types := $agent_types | jsonify -}}

We’re using Hugo’s os.Getenv to grab the API key from the environment variable I set earlier. We do a bunch of variable setup to work around templating limitations and to keep things from getting too hard to read, which is a constant battle. We check for configuration options and use “AI Data Scraper” if there are none. Then we throw it in the JSON blender and set up the request data.

dark-visitors.html

{{- $opts := dict
	"method" "post"
	"headers" (dict "Authorization" (slice $bearer) "Content-Type" "application/json")
	"body" (printf `{"agent_types": %s,"disallow": "/"}` $agent_types)
-}}

The request data is its own little ball of fun because Hugo wants a map for headers and a string for body. We pull in the bearer token from the variable we set earlier and throw it in a dict map for the headers, with slices for nested arrays. The body gets wild with printf—which is Go’s fmt.Sprintf in a trenchcoat—doing some string formatting to pull in the agent types.

dark-visitors.html

{{- with resources.GetRemote $url $opts -}}
	{{- with .Err -}}
		{{- errorf "%s" . -}}
	{{- else -}}
		{{- .Content -}}
	{{- end -}}
{{- else -}}
	{{- errorf "Unable to get remote resource %q" $url -}}
{{- end -}}

Now it’s time for the POST request. We’re using resources.GetRemote and this part is straight out of the docs with error checking.

We’re not done yet! We have configuring to do. The bare minimum config tells Hugo to generate a robots.txt for you:

hugo.yaml

enableRobotsTXT: true

The not-bare-minimum config sets up the API options. Dark Visitors offers three categories of bots:

hugo.yaml

params:
  darkVisitors:
    - AI Assistant
    - AI Data Scraper
    - AI Search Crawler

Respect the API

Don’t use the module if you build your Hugo site super frequently. Hugo can cache the API response when developing locally and when deploying to a server. This requires additional configuration. However, I build my site on CloudFlare Pages, which does not keep Hugo’s cache around between builds. Builds happen once or twice a day though.

What I Learned

I learned more about interacting with APIs and using environment variables. When I emailed Dark Visitors to ask why posting body:{...} as the request body wasn’t working (🤦‍♂️), they helpfully responded with copies of the log and a gentle note about my error. I found cool tools: httpie for pretty API responses in your terminal, and direnv for automatically loading environment variables from the .env file in my project directory. Some seriously great time savers here.

I also relearned the use of curly tie fighters {{-O-}} in Hugo templates to prevent whitespace issues. My robots.txt was full of weird indentation and multiple returns from the templating monster I created.

Reply via email


Zed Remembers Window Position Now
15 April 2024 | 2:18 pm

What’s been keeping me from using the Zed text editor for longer than a couple minutes is silly: every time I open Zed, it would open full-screen on my 27" monitor. But as of Zed 0.128.3, Zed remembers window size and position on macOS.

Zed was created by the same people who created Atom and Electron back in the day. This time they used a more performant setup—Rust instead of JS—and it shows. I’m tempted to post an updated terminal latency test with Zed included. The homepage for Zed has their own latency measurements.

Will I be switching away from Sublime Text? Who knows. Zed Industries is VC-funded and I have opinions about that. They did open-source the codebase though and the editor is super pleasant to use. We’ll see how things develop.

Look at how nice the design is:

Screenshot of the Zed text editor. Tabs along the top denote files. A sidebar on the left lists files in a project folder. A bar along the bottom of the editor lists items including cursor position, current language, GitHub CoPilot status, terminal panel, and multiplayer features

Reply via email


Safer Table Saws
11 April 2024 | 6:29 pm

NPR:

The federal Consumer Product Safety Commission (CPSC) appears poised to mandate a SawStop-type safety brake on all new table saws sold in the United States. The move would follow years of failed efforts and false starts by the agency to impose such a standard.

Yes! This is important. All table saws need a built-in safety system requiring more effort to disable and with less drawbacks than removable saw guards. I have been around many people who have been injured by table saws and have come close myself on several occasions. And I’ve encountered many table saws whose terrible guards and kickback jigs are either long lost or buried in the back of a PPE cabinet. No matter how confident one gets with a table saw, things can and will get unpredictable faster than one can react.

“Small manufacturers may go out of business,” Susan Orenga, the Power Tool Institute’s executive manager, said at a public hearing on the new rule in February. Requiring the safety brake would raise the cost of table saws too much, she said. “Sales of table saws will decrease, resulting in unemployment, and the government could be creating a monopoly.”

If you can’t afford to increase your customer’s safety while using your dangerous products, your business shouldn’t exist. It gets really difficult to buy a table saw when you’re drowning in medical debt and you can’t work because your profession requires functioning digits. And what about the impact of injuries on brand perception? “I got injured on that brand of table saw, I will never buy another from that brand”. Add a safety brake and you can say your saws are equipped with industry-standard advanced safety features that significantly reduce injury and cost to consumers. It’s hard to put a negative spin on safety.

In monopoly news:

In a surprise move at February’s CPSC hearing, TTS Tooltechnic Systems North America CEO Matt Howard announced that the company would “dedicate the 840 patent to the public” if a new safety standard were adopted. Howard says that this would free up rivals to pursue their own safety devices or simply copy SawStop’s. At the hearing, he challenged them “to get in the game.”

SawStop’s patents were my first thought while reading the article and I’m glad to see they’ve reversed their stance after years of anti-competitive litigation. I’ve used a SawStop saw and knowing my chance of injury was lower by design made it much less stressful to cut large sheets of plywood. You can disable the safety brake for wet wood, and the saw I used would reset the brake after I shut off the saw—a feature I hope makes it into the government standard. Safe by default.

Reply via email



More News from this Feed See Full Web Site