home rose
16 June 2024 | 12:00 am

he saw himself traveling the world but then realised there was nothing more beautiful than the home rose, the sunrise in the valley.

Implementing Incremental Static Regeneration in Aurora
16 June 2024 | 12:00 am

My website takes a few seconds to generate with the static site generator I made, Aurora. This is because thousands of pages need to be generated, each inheriting from multiple templates with custom logic. When I make a change on my site, it takes a few seconds for me to rebuild my site and see the change on my computer. This adds a lot of friction to making changes. There is a solution to this friction: Incremental Static Regeneration (ISR). ISR is a build approach where only the files you have changed, and their dependents, are regenerated. ISR watches your website templates. This is commonly implemented using a tool called a "watcher." Then, ISR finds the dependents of the file you have changed and regenerates all the pages. Because only the pages you changed and their dependents are regenerated, you can see changes substantially faster than if you rebuilt the whole site (unless you change a template that is used by every page on the site.). This weekend, I implemented ISR in Aurora. With this approach, it takes ~4-5 seconds for my site to build initially. Then, for every single file I change, the new version of a page is rendered in < 200ms. There are two stages: the initial build, then the incremental build. Here is the algorithm for the initial build: All templates and data files (i.e. blog posts) are opened. The dependencies of each file are calculated (i.e. templates the file uses, variables and includes in the file). Dependencies are sorted with a topological sort. This sort makes a list of templates ordered by when they should be generated. For example, if index.html needs a posts variable that is generated by reading through all posts, the topological sort will say that posts need to be generated before index.html. Pages are generated, processing their front matter and jinja2 templates. Pages are saved to the file system. Once the site has built initially, Aurora has a record of the whole site state. All of the templates are in memory, all of the site variables are stored, and more. ISR adds the following steps after the initial build: Create a watcher that scans the Aurora site folder for changes. This watcher is instructed to run a callback function. This callback function is the same as the one used to build the site, but with one exception: the name of the page that has been changed and saved is passed in as an argument. Aurora finds all dependencies of the file that has changed. Those pages are regenerated. The file that has changed is regenerated. Any file that has been regenerated is saved to disk. A signal is sent to the browser to indicate it should refresh since a new version of the page is available. Aurora uses the python-livereload package to implement live reloading and watching. This library lets you establish a watcher on a given directory that runs a callback function when a file is changed. It starts a HTTP server through which you can view your website. It then uses websockets to send a notification to any page you have open if and when that page has been regenerated. Suppose I have 1,000 blog posts and an index.html file. If I only change index.html, Aurora can rebuild exclusively index.html without rebuilding all of the other files. Rebuilding one file and the automatic browser refresh takes ~200ms, which feels almost instant; the rebuild of multiple pages is a marginal addition onto this. This speed to see changed substantially improves my development experience. On a site that has fewer levels of template dependency than mine, generation time would likely be faster. You can read the main function that controls the Aurora logic on GitHub

TIL: Visualising memory usage in Python
16 June 2024 | 12:00 am

This weekend, I have been working on my static site generator, Aurora. Aurora stores many objects in memory to achieve high performance. For example, metadata from all posts on my blog are stored into memory while Aurora generates my website. This allows any page to enumerate posts and access attributes from them without the data being a read from a disk cache. I asked myself: how much memory is Aurora using? I wanted a tool that would plot a chart showing memory use over time. With such a tool, I could see: How much memory is being used, and; Whether there were any spikes that I should investigate further. I found the Python command line tool memory-profiler to fit my requirements. This tool can be installed with: pip install memory-profiler To use the tool, add mprof run to the start of a command to execute a Python script, such as: mprof run aurora build aurora build is registered as a command. For a script, you may use: mprof run python3 app.py mprof monitors memory use every 0.1 seconds by default, then plots the results onto a chart. This chart can then be viewed with the mprof plot command. Here is the chart showing memory usage from Aurora: Aurora memory usage chart From this chart, I can see that memory utilisation reaches points no higher than 350 MB, which is manageable for the environments in which I have deployed the software. Such use was expected given thousands of pages are opened, loaded into memory, then used to generate my website. I can also see that, if I wanted, there could be memory improvements: memory used increases incrementally, which is likely from when I open all the files and load them into a dictionary. I could use a lazy-loaded dictionary to only open files when they are used, rather than open them all at once. I could also have a cache that removes template contents after they are used. mprof doesn't tell you specifically what causes the increase in memory, although it does give insights into the general trends of memory usage. For my use case -- a preliminary investigation of memory usage -- this tool was exactly for what I was looking.


More News from this Feed See Full Web Site