High availability (HA), distributed, lightweight, static site yet with comments, with a responsive UI. These are a few characteristics of the ideal blogging platform I have always desired. So I built it. Warning: this is proof-of-concept code only.
I exported data from my old PivotX instance. I wrote server-side code
to handle blog comments and distribute them across multiple servers. I designed
a responsive UI. And after many hours of work I finally switched
blog.zorinaq.com to—wait I need a name for my project… hablog!
First, let’s talk about design & vertical space (click to enlarge):
I took screenshots of 8 mobile sites on an Android phone with a 720×1280 display running Chrome.
The leftmost screenshot shows that content on my blog typically starts at 350–400 pixels from the top of the screen, whereas most sites start at 600–900 pixels, or in extreme cases they use more than the entire screen to display ads and zero content *cough*BBC*cough*. I dislike waste of vertical space and I think my design gives readers a chance to engage with more of the post before scrolling down and before waiting for the whole page to load.
Notice also that, on a small screen, images on my blog can be as wide as the full width of the screen. Why waste margin space?
On larger screens the responsive UI transitions to 2 columns:
(Compare this to the previous design of my blog.)
The new design allows the post’s first sentence to start right at the top of the page, maximizing (again) the amount of content shown to readers while keeping a reasonable line length and without scrolling down.
Another thing. I am lucky to have relatively high-quality comments on my blog. So instead of relegating them to the bottom of the page, the 2-column layout lets me showcase them by tucking them alongside the post—like comments in a Google Doc or MS Word document. The comment submission form is also right there at the top, to entice readers to leave comments without hunting for a form at the bottom of the page.
Finally, I needed a mechanism to emphasize my own comments. So I came up with the idea of this vertical orange line between the 2 columns that swerves around my replies. Doing so groups them with the post, which is perfectly logical because they share authorship—me.
My blog requires no sign-in in order to reduce friction when submitting a comment.
Many modern web designs adopt a sans-serif font for titles and headers, and a serif one for the main body. I like that.
For titles & headers I picked Raleway. Notice its elegant “fi” ligature in “finally” in this post’s title.
For the main body I picked Noto Serif which, by the way, is the default serif font for Chrome on Android. It has great Unicode coverage so lines of text tend to keep the same line-height even when containing various Unicode characters. I was annoyed at how many other popular fonts do not provide for example a glyph for U+2126 OHM SIGN (Ω) which I use here. If my custom font did not provide this glyph, then text rendering would fall back to the browser’s default serif font, and if it has a taller line-height than my custom font, then the line containing this glyph is taller than other lines which is visually unappealing.
Low-contrast sites suck. And black text on white background hurts the eyes. So I chose black text on very light grey background (#f0f0f0).
As to the color theme, it is grey & orange. Maybe not the best? I am open to suggestions.
The visual design is the only thing visible to my readers.
But what about the technical guts of
Six years ago I described the architecture I wanted:
“I will soon have 2 servers colocated in 2 datacenters on 2 different continents, with blog.zorinaq.com having 2 A records for these 2 servers. Browsers try to connect to the 2nd if the 1st fails; and with DNS pinning they tend to stick with the one that works for the remaining of the browsing session. Doing it this way is a cheap way of providing HA for a website.”
Today the cost of VPS and dedicated ARM servers is so low that I decided
to run my site on 3 servers, from 3 different providers,
on 3 continents. This is why
blog.zorinaq.com resolves to 3 IP
- Digital Ocean in the US ($5/month VPS)
- Scaleway in Europe (3€/month dedicated ARM server)
- Vultr in Asia ($5/month VPS)
On the software side, I put all my posts in a local Mercurial repository, and use static site generator Jekyll to generate the site locally. The pages look complete except they are, well, they are not dynamic but static. They miss blog comments. I place this tag in the page at the location where I would like comments to be inserted:
Remember this tag for now. I will come back to it later.
After generating the site locally I run a bash script to rsync the files to my 3 servers, except with a twist…
The static content (image assets, home page index.html) are
the web server’s document root
/foobar/html/index.html /foobar/html/assets/image1.png /foobar/html/assets/image2.png /foobar/html/...
However the dynamic content (post pages that will contain the reader comments but
for now only have the
<!--hablog-insert-comments--> tag) are
rsync'd to a different directory
/foobar/db/disk-vibrations-and-ssds/index.html /foobar/db/what-the-heck-pandora/index.html /foobar/db/...
Keep in mind this is all done in parallel on 3 different servers.
Now hablog (high availability blog) comes into play. It is made of
sync-daemon (total ~400 lines
of Python code and ~50 lines of bash).
Each server runs a daemon
watch-db that uses inotify to watch the content
/foobar/db and whenever files are
rsync'd there, they
are processed and copied to the web server’s document root
The processing step replaces the
mentioned earlier with the actual comments.
Where are the actual comments fetched from? When a comment is submitted to
blog.zorinaq.com via a
/hablog request, a simple
hablog.fcgi handles the request, verifies the Google reCAPTCHA,
and writes the comment as a JSON file under
(I will explain “removed” in a moment.)
/foobar/db, it notices
both new posts (
new comments (
/foobar/db/<post-id>/...), and will
be able to replace the
with the comments in order to regenerate the final HTML file
How are the comment files synchronized between my 3 servers? This is the role of
sync-daemon: a simple cronjob which runs every few minutes on each server and
rsync only the comment files to/from the other 2 servers. If any 1 of
the 3 servers goes down, the remaining 2 online servers still synchronize
comment files between each other. When the offline server comes back online,
whichever one of the 2 other servers runs the cronjob first will resync
all the comments to the resuscitated server.
This is the crux of how hablog implements high availability: the 3 servers form a distributed redundant architecture and are independent from each other.
sync-daemon does not use the rsync
Comment files are never modified, never deleted, only created once.
As a result synchronization conflicts are impossible by design (KISS).
Occasionally a comment does need to be deleted, such as a spam that circumvented reCAPTCHA. But wait, I said comment files are never modified and never deleted…
Here is another crucial design aspect of hablog. This one makes comment modification/deletion possible.
hablog names the comment files according to the convention
watch-db processes a post, it sorts all comments according to
timestamps (that is how they end up in chronological order in the final HTML).
watch-db also lets a more recent comment file overwrite the JSON
attributes if the more recent file has the same comment ID.
For example if a comment is saved as
but a file named
1470000001.633ce99f46a21520b67a3022469241fa exists (notice the
newer timestamp) and contains:
Then it overwrites the
removed JSON attribute from 0 to 1, and the code considers
it deleted. Any of the other JSON attributes (
comment, etc) can be overwritten
by a newer comment file. For example
comment could be overwritten to edit the
Deterministic comment IDs
Comment IDs are generated server-side by hashing together the post ID +
the post’s last comment ID (if you inspect the HTML, this is the
seed form input value) +
the username + the content of the comment. Therefore
if a browser submits a comment to 2 or more servers (eg. due to network
glitches causing the browser to retry the
POST request against 2 or more of the IP addresses
blog.zorinaq.com), the servers will each generate the same comment ID and each
save the file in
/foobar/db, which will not cause a data discrepancy.
At worst this would result in 2 files with a possibly different
<timestamp-since-epoch> in the filename,
but containing the same content which is harmless (per the logic of newer JSON data overwriting
older JSON data).
proof-of-concept code only.
hablog is probably not for you.
Storing one comment per file and using
rsync to synchronize comments only
scales up to a point, maybe up to 100k files. My blog has only
1000 files as of today (80 posts + ~1000 comments).
hablog gives me many advantages for a high-traffic few-comments site.
Lightweight high performance static site. All 3 servers combined can handle up to ~2500 page hits/sec of my largest text-only posts (50kB), or 350 Mbit/s of traffic according to my benchmarks. The bottleneck is not CPU or IOPS, but network bandwith available to my servers. This level of performance is definitely much more than I need considering that my heaviest slashdotting—when I published this—was 40 page hits/sec sustained for a few hours. At the time the PivotX instance could not keep up with the traffic because PHP handling was too CPU-intensive, so I am relieved to move to a static site that can handle 100× more page hits/sec :)
Highly available redundant architecture with no single point of failure. It would take 3 different outages at 3 hosters on 3 continents at the same time to take down the site. In fact even if the servers are available only 98% of the time—7 days of downtime per year!—hablog is expected to still provide five nines availability (as long as downtime amongst the 3 servers is random and uncorrelated): 1 - (1 - 0.98)³ = 99.9992%
CLI tools and revision control. My posts are text files, edited locally with my favorite editor and placed under revision control. I can run custom CLI scripts on my servers to bulk delete the occasional spam comments. I prefer doing it this way rather than using a constraining point-and-click web UI.