Feedback: I try to answer "how to become a systems engineer"
31 May 2023 | 12:51 am

I got some anonymous feedback a while back asking if I could do an article on how to become a systems engineer. I'm not entirely sure that I can, and part of that is the ambiguity in the request. To me, a "systems engineer" is a Real Engineer with actual certification and responsibilities to generally not be a clown. That's so far from the industry I work in that it's not even funny any more.

Seriously though, if you look up "systems engineering" on Wikipedia, it talks about "how to design, integrate and manage complex systems over their life cycles". That's definitely not my personal slice of the world. I don't think I've ever taken anything through a whole "life cycle", whatever that even means for software.

In the best case scenario, I suppose some of my software has gotten to where it's "feature complete" and has nothing obviously wrong with it. Then it just sits there and runs, and runs, and runs. Then, some day, I move on to some other gig, and maybe it keeps running. I've never had something go from "run for a long time" to "be shut down" while I was still around.

This is not to say that I haven't had long-lived stuff of mine get shut down. I certainly have. It's just that it's all tended to happen long enough after I left that it wasn't me managing that part of the "life cycle", so I heard about it second- or third-hand and much much later.

If anything, some things have lived far too long. My workstation at the web hosting support gig started its life with me in 2004 as a pile of parts that had formerly been a dedicated server. It had a bunch of dumb tools that I wrote and other people found useful. It should have been used to inspire the "real" programmers at that company to code up replacements, but seemingly did not. That abomination lived until *at least* 2011, or five years after I moved on from that company. None of that stuff was intended to run long-term, but someone kept tending it for years and years. It was awful.

But, okay, let's be charitable here. Maybe the feedback isn't asking for that exact definition, but rather something more like "how to get a job sort-of like the things I've done over the years". That's the kind of thing I definitely could take a whack at answering, assuming you like caveats.

I think it goes something like this: you start from the assumption that when you see something, you wonder why it is the way it is. Then maybe you observe it and maybe do a little research to figure out how it came to be the thing you see in front of you. This could go for just about anything: a telephone, a scale, a crusty old road surface, a forgotten grove of fruit trees, you name it. By research, I mean maybe you go poking around: try to open that scale with a screwdriver, get out of the car and walk down the old road, or turn over some of the dirt in the field to see if you can find any identifying marks.

I should also point out that this goes for trying to understand how people and groups of people came to be the way they are, too, but most tend to not respond well to being opened with screwdrivers, walked on, or turned over in the dirt. (And if they do, well, don't yuck their yum.)

Anyway, if you start from this spot, then maybe you start coming up with some hypotheses for how something happened, and then sort of mentally file that away for later. Or, maybe you even write it down. Then as more data comes down the pipe over the years, you revisit those thoughts and notes and refine them. Some notions are discarded (and noted as to why), but others are reinforced and evolved.

Do this for a while, and sooner or later you might have some working models. They might not necessarily be the actual explanation for why something is the way it is, but it gives you a starting point.

Then, one day, something breaks, and you end up getting involved. It might be a high-level system that's new to you, but it has some low-level stuff deep inside, and you recognize some of that. One of those low-level things had a history of doing a certain thing, and that never changed. They might've built a whole obscure system over top of it, but the fundamentals are still there, and they still break the same way. You go and look, and sure enough, some obscure thing has happened. Nobody else saw something like this before, and so when you point it out and flip it back to sanity to restore the rest of the system, they look at you like you just pulled off some deep magic.

The question is: did you, really? It's all relative. If you've been poking and prodding at things and have remembered the results of these experiments from over the years, it's not really new to you. It's just one of many events and might not be anything particularly special by itself. It just happened to be important on this occasion.

Some people will accept this explanation. Others will refuse it and will insist that you are a magician for fixing "the unfixable". A few others will know exactly what you did because they did it themselves once upon a time.

Then there are the one or two in every sufficiently large crowd who will see that you are being celebrated for knowing and utilizing some obscure factoid, and they will make it their mission to wreck your world. Basically, they have to make your random happenstance about them somehow, and so they make it about how it hurt them and how they need to get back at you. If this sounds pathological, it's because it is, and unfortunately you will encounter this at any company which doesn't have the ability to screen out the psychos.

This also goes for the web as a whole. Having something you've done be (temporarily!) elevated to a point of visibility somewhere public will just set these people off. This, too, is enabled by having forums which don't notice this and deal with their pests.

Now, for some examples of obscure knowledge that paid off, somehow.

pid = fork(); ... kill(pid, SIGKILL); ... but they didn't check for -1. "kill -9 -1" as root nukes everything on the box. This takes down the cat pictures for a couple of hours one morning because it turns out you need web servers to run a web site. Somehow, the bit in the kill(1) man page about "it indicates all processes except the kill process itself and init" stuck in my head. Also, the bit in the fork(2) man page that says "on failure, -1 is returned in the parent".

malloc(1213486160) is really malloc(0x48545450) is really malloc("HTTP"). I think this came from years of digging around in hex dumps and noticing that the letters in ASCII tend to bunch together (this is entirely deliberate). Seeing four of them in a row in the same range with nothing going over 0x7f suggested SOME WORD IN ALL CAPS. It was.

The fact I had seen some of this stuff before is just linked to some chance events in my life, combined with doing this kind of ridiculous work for a rather long time now. There are plenty of other times when something broke (or was generally flaky) and I had no idea what it could possibly be, and had to work up from first principles.

For someone who's just getting started, it's a given that you haven't seen many of these events yet. Don't feel too badly about it. If you keep doing it, you'll build up your own library of wacky things that could only be earned by slogging away at the job for years and years.

Also, if you think this is nuts and choose another path, I don't blame you. This *is* nuts, and it's entirely reasonable to seek something that doesn't require years of arcane experiences to somehow become effective.


Administrivia: new web hosting arrangements
27 May 2023 | 10:20 pm

Welcome to the new hosting situation. Over the past month or so, I've been working to move this web page and some of my other stuff to a new spot. As of this morning, it's done, and this is being served from the new machine. Say hello to flicker.rachelbythebay.com.

So, what happened? Well, a cute little company called SoftLayer turned into a massive monster called IBM. They still had acceptable rates and actually offered IPv6 (barely), but their corporate brain damage only got worse every passing year.

They had definite "left hand, right hand" moments, like when I went to turn up a new machine in February 2020 and they didn't offer a kickstart of RHEL 8. It's like, hello, you bought Red Hat six months before. RHEL 8 itself had been out for nearly a year at that point, and indeed, had made it to 8.1 by then. So, I had to do CentOS 8, and then they hosed us all royally that year. That's when I stuck more pins in my IBM voodoo doll and migrated to Rocky.

Then there was the day in January 2022 when I was doing some work on the machine and noticed that it needed a firmware update or something. I figured, okay, fine, I'll take the downtime and let their automatic doodad do exactly that. It's really late and nobody should care. I queued it up and powered it down (per their instructions).

I watched from the remote screen monitor as the automatic updater powered it up and got it to boot over the network into Windows (!) in order to run some nasty thing that popped up CMD windows and worse. I went off to do something else to distract myself. One hour turned into two, then into three, and support started saying "oh, it'll be a total of four hours". Great. The worst part was the complete lack of updates during this process. They just kept flailing.

I finally said "please just abort this and put my machine back up". I told them that it failed, and they should not attempt to troubleshoot their automation system on my machine. They should admit that it failed, put me back up, and leave me alone for a while until I can figure out what happens next. They finally got someone who was paying attention to do exactly this, and the machine went back up.

We scheduled it to happen the next night during another four-hour window. They started it, worked for about an hour, then called it and decided to go with a chassis swap. Yep, they pulled my drive out and jammed it into another box (and I was fine with this). Since I'm not a complete clown, it came back up by itself and figured everything out and kept going. How about that.

So, if you noticed multiple hours of the site being down on January 3rd, 4th and 5th of 2022, that's why!

What else with them? Their customer support is completely boneheaded sometimes. They had this "VPN" thing so you could tunnel into your "privatenet" which has the IPMI/remote KVM interface for your server(s). I would do that when doing a kernel upgrade in case I screwed up and needed to rescue things. I'd get that working *first* before doing the reboot just out of paranoia. I've yet to need it, but old habits die hard.

One day, it just stopped working. I filed a ticket asking them what I should be doing, since their documentation web page (and I provided the URL) said to use X, but X wasn't working. Is there a new hostname, or can you fix the thing?

They came back and said, oh, use this documentation web page.

It was the same page I had put in the request, unchanged.

Several days went by. Finally, I "thanked" them for "providing the same URL that I had provided them in the first place", and closed the ticket with a thumbs-down.

In the meantime, I had managed to find another way in by guessing how their hostname scheme worked, and got my work done and rebooted into the new kernel. They never really fixed the docs as far as I know, and they are probably still pointing people at a long-dead VPN endpoint.

But no, that wasn't it, either. The machine was physically in Texas. That particular hive of hate and villainy is talking about making ISPs restrict access to certain kinds of web pages. That's obviously about consumer-side stuff, but they could probably find ways to extend that to the *hosting* side of it, too. Also, screw them and feeding their tax base. I started looking for replacement options in other locales.

At this point, I noticed that all IBM would sell me was something that was much less box for much more money. I'm talking a slower processor, less memory, and all of that stuff, and the monthly bill would go up. Screw. That.

And then I got my final sign from the universe: they're "modernizing" and so DAL05 (my location) will be shutting down in April 2024. I didn't even notice this until I happened to be in their "portal" to do some unrelated work. Did they mail me? No. Did they call me? No. I just happened to notice it while in there one day.

Well, that's the last sign I needed, and I pulled the trigger on a colocation cabinet a few days later. That then started the whole crazy mess of getting a server, pulling together the network equipment, installing it *physically* (this was hard!), installing it *logically*, and then migrating everything.

Late Friday night into Saturday morning, I started flipping things over and kept an eye on them. A few minutes ago, I turned off the web server on the old machine. I figure if your DNS provider is crazy enough to clamp my 900 second TTL up to something over 12 hours, you deserve to talk to a brick wall of RSTs for a while.

So here we are. I now have a server I can physically lay hands on, albeit with a little driving involved. I got it used, and it's a real beast, but it does work. I'm also hearing from early testers that it's significantly faster for them. I thought it was just because I moved it about 40 milliseconds closer to me, but it just seems snappier for them, too. How about that?

I probably screwed up at least one thing with this migration like I did last time, so if you spot something amiss, please do holler. All of the URLs should still be working and all of that stuff. I already know the mtimes all reset, so a bunch of pages look new when they have the same content - that was unavoidable.

That's the story of one more bird in the flock.


Fulfilling a reader's request for my "dot files"
6 May 2023 | 7:02 pm

I got a bit of feedback the other day from Nate asking if I had dot files. I certainly do. I assume what they meant is if I have particular customizations, and then if I would care to share them. I definitely have a bunch of particular changes, and as for sharing them, why not. It lets me get a bunch of shots in at things that have become annoying over the years, and that means it's perfect for stirring up the hornet nests with a Friday night post.

Starting on my daily driver box that runs Debian, then:

I have a .bashrc that has a bunch of dumb two- or three-letter aliases which amount to 'ssh (otherbox)'. For some reason that is lost to time, they all start with the letter m, and then the second letter sometimes reflects the name of the target system - "mm" takes me to my Mac Mini (which also runs Debian), for instance.

The stock PS1 bugged me a bit, so I mangled it down to this:

export PS1='\h:\w\$ '

... which turns into "hostname:/some/path$ ", in other words.

I think I've had a prompt like that on my personal machines basically forever - probably back to 1994 if not before. That's fine when I'm just running things as myself. If I run sudo, I get the stock setting which ends up looking like this:

root@hostname:/some/path#

... and that's fine, too. Making it look a bit different when rootly powers are in force is a good thing.

The next one is switching off another annoyance:

alias ls='/bin/ls -N'

The -N switch to ls says "print entry names without quoting" ... and it's the difference between having just the filename shown, spaces or no, and having it 'wrapped like this'. The way I see it, if you're printing quotes there, they'd better be part of the damn name. It reminds me of the time they started doing crazy UTF-8 "smart quotes" in their error messages and I didn't know it had changed. Cue me going "WTF is this gunk in this filename?" and thinking we had major corruption in the system somewhere.

I'd probably put up with the quoting if it didn't bump everything else out to the right another column. Two spaces between the time and the filename? Heresy!

The next two are filed under "everyone sucks at setting colors in Unix tools so stop adding it to everything". The first one is for sar:

export S_COLORS=never

... and the second one is something I found a little later which seems to be something that might work across multiple programs (assuming they've been patched to recognize it):

export NO_COLOR="eat flaming death you [elided]"

You can guess what the rest of it says. The actual value doesn't matter. Just having it set does the job. The value I put there is just to make me feel better every time I have to fight to get back to the perfectly working system I've had.

That's it for .bashrc. Next, I have .gitconfig which is mostly boring. There's a [user] section which has name= and email=, and those are set to about what you would expect.

I have pull.rebase set to true because that's always what I would use anyway when doing a pull, and it started whining at some point. So I put this in here to make it keep doing what I wanted. This is because I don't do branches and other goofiness and just want a nice simple continuous timeline for my commits.

I also have init.defaultBranch set to main because, eh, why not? I've designed enough systems based on the old broken naming schemes and don't need any more.

I have a .gdbinit. Why? Same old story: the default now sucks. It has one line:

set style_enabled off

It's amazing just how awful it is when it changes colors every time it hits a ( or " or whatever. How do people deal with that stuff? So bad. It's so nasty.

Next up, .nanorc, and this one is a three-ring circus. Basically, for the longest time, I didn't need one of these. Now, I add about one line on average every three or four years because - again - things keep changing for the worse.

Here's where things are now:

syntax "all" ".*"
color yellow "^$"
unset locking
set emptyline
set breaklonglines

The first two have been with me for quite a while now, and serve to disable syntax highlighting across the board. Again, not my thing.

Line three stops it from pooping out stupid ~ files everywhere. Not wanted, not needed, didn't ask for it, was forced upon me, had to murder it with a setting.

Lines four and five just put back behaviors that they dumped in 4.0: the blank line right below the status bar at the top, and the wordwrap that happens when you hit a certain column. I use that all the time, like, well, *right now* writing this post. It hard-wraps at 72, because OF COURSE it does.

Next is my .Xresources which provides a way to disable some obnoxious behavior in urxvt without having to recompile it. For the longest time, I'd chop it out and drop a custom binary into my bin directory. Then I realized it could be tamed without such mangling, and here we are:

URxvt.perl-ext:
URxvt.perl-ext-common:

This has the effect of making it so a double-click highlights the whole word, and a third click highlights the whole line *even if* someone's holding a LISP convention on that particular row of the terminal.

Then I have a .xsessionrc which needs to exist because I now log in through xdm, and the window manager (fluxbox) ends up inheriting *that* environment. Yep, it doesn't get a .bashrc type thing applied to it. (Not gonna lie - this took a while to figure out. Quick, which of .bashrc, .bash_profile, .profile et al get run for any given type of login you do to a box? Text mode, X *and* ssh all matter.) Anyway, that means I have to twiddle my PATH in there, or the commands that fluxbox runs for me won't find anything in those extra directories.

That is, I like my .fluxbox/menu entries to be short and sweet, like "term". That's a small stupid script in my bin directory. If that's not in my PATH then I'd have to spell out the whole /home/blahblah thing, and that's just idiotic.

Speaking of fluxbox, that has a dot directory, and a startup script in there to set a few things up properly.

xset b off
xset r rate 250 30
xset dpms 1900 2000 2100
xscreensaver &

Line one turns off the console beep - not that my machine has a PC squeaker any more, but I think some things try to be "helpful" by sending a beep into the system audio path. That can be really obnoxious, like when I'm deliberately holding down a key for whatever reason and get to the beginning of the line.

Line two is about getting that key-repeat going at a speed I like. If I end up on a machine where that's not fast enough, it becomes obvious pretty quickly, and I have to go adjust things. Not every situation allows for things like ^W to eat a word or ^U to eat the whole line, and so holding down backspace to change the wording of something is what I want.

Likewise, if I want to put a "-------------" divider somewhere, I don't want to wait for it to get going. It looks like that means "wait 250 ms before repeating, and then repeat at 30 Hz", but I had to look it up because it's been set like that for as long as I can recall.

Or maybe I want to hold down the cursor key to scroll something, or just move somewhere else on the line. Same thing.

Annoyingly, this seems to be set in the keyboard itself and not on anything local to the machine, so if I have to replug the keyboard for some reason, I have to run that again or it'll be stuck in stock molasses mode. This feels like a regression from the PS/2 days but I haven't bothered plugging in one of my old model Ms to verify this.

Line three just sets up the power-saver specifics on the monitor. Those don't usually matter too much since I have a hotkey that explicitly locks things and then forces it to go to sleep right away, and I push that when I'm done using this thing.

Line four, well, that's my dose of jwz, and that's what actually keeps the screen locked, as opposed to the legions of craptacular also-ran "lock" programs that always end up sucking and failing open. I can't imagine how many years in total my screens have been protected by xscreensaver in "lock" mode.

The rest of that file just starts my three Window Maker-era widgets and those aren't important or even interesting. There's a clock/calendar, the CPU load, and something to twiddle the system volume for when I have speakers or headphones connected.

That's about it. I don't use .plan or .project files any more since I haven't run fingerd for decades, and besides, my machines are all just me and nobody else, and so a local finger is also not a thing. (Oh, get your minds out of the gutter. It's the "ratting out to the cops" sense of "finger".)

Want to see the last time I used that stuff? Here's the file in my homedir archive from the last machine which had that running:

-r-------- 1 rkroll rkroll 34 Apr 28  1996 .plan

See, told you it's been decades. All I did was rip off a line that I had seen in someone else's file that was intended to sow confusion:

Segmentation fault (core dumped).

The idea is that you'd think that the far-end finger process crashed, or the far-end finger daemon, or maybe even *your local finger client*, and then you'd run around trying to figure it out. Then you'd eventually realize what was going on and shoot a nerf dart at whoever wasted your time.

Ah, the '90s.



More News from this Feed See Full Web Site