Worst Practice

Tips & Tricks to make Jekyll do what you need

Posted on November 18, 2022 @ 14:10

Posted under the Development Environment category

Level: Beginner

Posted with the following tags: #Jekyll, #Liquid

The most simple websites are those have only static files. For a blog like this is perfect. But making everything static by default is difficult. You need a generator. And when you work with Jekyll, sometimes you meet Mr. Hide as well.

Tips & Tricks to make Jekyll do what you need
Unknown author, published by the National Printing & Engraving Company, Chicago,
Public domain, via Wikimedia Commons

What is Jekyll?

Short answer: Jekyll is a static site generator.

Long answer: Jekyll is a static site generator that uses layouts and Markdown to generate a static website. It has its own templating engine, named Liquid. If you know the Symfony’s Twig template engine, you will find the Liquid very familiar. At least syntax-wise.

Why Jekyll?

When I started to deal with static site generators, I didn’t know much of them. I asked my friends what they recommend, but almost everybody told a different tool as to be the best. …including the “Write your own in PHP, dude!” option.

So I tried some of them, and I chose Jekyll because its template engine’s syntax similarity to Twig’s, which I know well.

Tops and Flops

The bright side of the story is, the Liquid is easy to learn, easy to understand and easy to use. Unless you keep yourself on the path that the documentation shows you.

The dark side of the story begins as soon as you want something different, or a bit more.

The Liquid has three main components:

  • objects
  • tags
  • filters

Liquid tags are not the same as on Twitter. These are more like statements, functions, procedures. For example, the {%if ...%}...{% endif %} is a control flow tag.

Just to confuse you, it has Twitter-like tags too. And categories. Great.

So, from these three components you can create many beautiful things, but sometimes it’s very painful to customize them.

Challenges

During the development of this blog, I faced some issues, and I had to be very creative to solve them. My problem now is I can’t really remember the order of the issues came up, so I can’t tell now which issues in which order lead me to the current state of the setup. But I try to grab some details.

Slug vs Label

I had a private, more personal blog that I wrote in Hungarian. There we have special letters above the default latin-1 character set. In a very short time I figured out that Jekyll was written by English-speaking programmers. Because - according to my decade-long experience - many English-speaking programmers just simply give a damn on the rest of the world, that would like to speak, read and write in other than English. Sorry guys, that’s the truth.

And in Jekyll, what you give for example for a post’s category name, it will be used for the URL too. And if you have special characters you are screwed. For example the category: 'csőlátás' will be transformed into /cslts. Not good. Also even in English you can shoot yourself on the leg, when you need a short slug for a longer label:

1
category: 'Development environment'

For this the slug will be either /development/environment/... or /development%20environment/..., however a simple /devenv/... would be enough for the URL. And the same goes for the tags.

Of course there are I18n plugins for Jekyll, but what I tried, didn’t work very well.

What could I do? I tricked the system, with the system’s tools:

  • Every custom data you create in the posts’ front matter will be collected in the post variable.
  • You can create custom data by capturing a printouts.
  • You can create arrays by splitting up strings. Liquid even has some array-related filters.
  • You can iterate through these arrays.

So, if you keep adding multiple built-in/custom data consequently for all the posts, and you keep them in sync, you can create two arrays, and the item index will be the connection between them. Example:

1
2
3
4
5
6
---
category: 'devenv'
categoryLabel: 'Development Environment'
---

Page content

then you can use the following:

1
<a href="{{ page.category }}">{{ page.categoryLabel }}</a>

Okay, you can say, this is easy, we just go through the site.posts array and print these values. Okay, but how you do it when you have more than one category and/or tags? How you pair them?

1
2
3
4
5
6
7
8
---
category: 'devenv'
categoryLabel: 'Development Environment'
tags:   [docker, wsl2, powerline-shell, phpstorm, windows]
tagLabels: ['Docker', 'WSL2', 'Powerline Shell', 'PHPStorm', 'Windows']
---

Page content

Of course, you can do it, but you have to write all the damn iteration every place, where you want to use them. Wouldn’t be easier to pre-collect all the categories and tags and their labels and just use them?

Variables

I introduced a new include file, called variables.html:

1
{%- include variables.html -%}

Categories

Inside that file I made all my dirty tricks: collect all the posts’ categories and labels, concatenate them into one string, and then split them back to arrays:

1
2
3
4
5
6
7
8
9
10
{%- capture categorySlugs -%}
    {% for post in site.posts %}{{ post.category | strip }}{% unless forloop.last %},{% endunless %}{% endfor %}
{%- endcapture -%}
{%- assign categorySlugs = categorySlugs | split: ',' | uniq -%}

{%- capture categoryLabels -%}
    {% for post in site.posts %}{{ post.categoryLabel | strip }}{% unless forloop.last %},{% endunless %}{% endfor %}
{%- endcapture -%}
{%- assign categoryLabels = categoryLabels | split: ',' | uniq -%}

Tags

We do the same for the tags:

1
2
3
4
5
6
7
8
9
10
{%- capture tagSlugs -%}
    {% for post in site.posts %}{{ post.tags | join: ',' }}{% unless forloop.last %},{% endunless %}{% endfor %}
{%- endcapture -%}
{%- assign tagSlugs = tagSlugs | split: ',' | uniq -%}

{%- capture tagLabels -%}
    {% for post in site.posts %}{{ post.tagLabels | join: ',' }}{% unless forloop.last %},{% endunless %}{% endfor %}
{%- endcapture -%}
{%- assign tagLabels = tagLabels | split: ',' | uniq -%}

Rules to keep:

  • Never sort these arrays, otherwise the slugs and labels will be mixed up
  • Always make sure that one slug doesn’t have multiple labels and vice-versa.
Usage

As I wrote before to connect the slug with the labels, the lists must be synchronized, then we can use the loop index to get the right label for the slug:

1
2
3
4
5
6
<ul>
{%- for tagSlug in tagSlugs -%}
  <li><a href="tags/{{ tagSlug }}">{{ tagLabels[forloop.index0] }}</a></li>
{%- endfor -%}
</ul>

To get the actual serial number (index) of the loop, we can ask it from the forloop variable. If we use the forloop.index, the counter will start from 1. If we use the forloop.index0, the counter will start from 0, and since we want to use it on another list, we need this one.

Dates

For the archive, we have to deal with dates. And here we also have the same problem: we want to use simple dates for the URL, but we want a more talkative version for the labels.

First we need all the posts’ dates in the right order. Luckily the posts are ordered by date.

1
2
3
4
5
{%- capture sortedDates -%}
    {% for post in site.posts %}{{ post.date }}{% unless forloop.last %},{% endunless %}{% endfor %}
{%- endcapture -%}
{%- assign sortedDates = sortedDates | split: ',' | uniq -%}

For the archive I wanted to list posts on a monthly basis, so the URL slug should be YYYY-MM format:

1
2
3
4
5
{%- capture dateSlugs -%}
    {% for date in sortedDates %}{{ date | date: '%Y-%m' | strip }}{% unless forloop.last %},{% endunless %}{% endfor %}
{%- endcapture -%}
{%- assign dateSlugs = dateSlugs | split: ',' | uniq -%}

And for the labels we want to print the name of the month and then the year. I could use the date: %B filter to print the name of the month, but again it’s English only (well, good enough for this blog), and the I18n plugin is not good for me. So I decided to add translations.

In Jekyll, we can refer to additional .yml configuration, that is placed in the _data folder. So I created two files:

  • _data/en.yml
  • _data/hu.yml

In the file, we can create sections, subsections, values, value collections etc. For example, the _data/hu.yml looks the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
months:
  - .
  - január
  - február
  - március
  - április
  - május
  - június
  - július
  - augusztus
  - szeptember
  - október
  - november
  - december

You can notice I list the months names in Hungarian, but the first is a nonsense. The reason is, from the date formatter we can get the number of the month, which is in the range of 1 and 12. But this config will be presented as an array, and the arrays starts with the zero index.

So, now we have the translations, we can refer them with {{ site.data.hu.months[5] }}, or if we defined the lang variable in the config, we can do {{ site.data[site.lang].months[5] }} too. After this, creating labels is a simple task:

1
2
3
4
5
{%- capture dateLabels -%}
    {% for date in sortedDates %}{% assign m = date | date: '%-m' | minus: 0 %}{{ site.data[site.lang].months[m] }}, {{ date | date: '%Y' }}{% unless forloop.last %};{% endunless %}{% endfor %}
{%- endcapture -%}
{%- assign dateLabels = dateLabels | split: ';' | uniq -%}

List the top 3 most used tags and display their usage number

Another interesting solution was born here. To know what is the internal content of the site.tags, we call the help of the debug filter:

1
2
{{ site.tags | debug }}

… which will print something like:

1
{"docker"=>[#], "js"=>[#, #, #, #], "clean-code"=>[#, #, #], "react"=>[#, #, #], "webpack"=>[#, #, #], "jekyll"=>[#]}

So the key in this object holds the slug, and the value is an array with some unknown data. What is important for us, it’s countable. The more a tag is used, the larger its value-array is. From these information, we need to make a sorted list.

How to sort? Make the count to be a string, concatenate to the slug, and sort as text:

1
2
3
4
{%- capture counts_with_tags_string -%}
    {%- for tag in site.tags -%}{{ tag[1] | size | prepend:"000000" | slice:-6,6 }}:{{ tag[0] }}{% unless forloop.last %},{% endunless %}{%- endfor -%}
{%- endcapture -%}

Let’s go one-by-one.

  • The {{ tag[1] | size | prepend:"000000" | slice:-6,6 }} gets the tag value-array’s size.
  • By prepending a bunch of zeros in front of the number, converts it to string. So if we used a tag for example 123 times, then it will be 000000123.
  • We need to make every tag counter to be exactly the same length to be sortable, so we keep only the last 6 characters: slice:-6,6.
  • Then we print a colon (:).
  • Then we print the tag slug (tag[0]).
  • And finally, unless it’s the last item in the iteration, we print a comma (,) as well.

We need to be careful, the capture tag capture the whitespaces as well, so always double-check the result.

From the example above, with this capture we get the following string:

1
000001:docker,000004:js,000003:clean-code,000003:react,000003:webpack,000001:jekyll

To convert it back to array, and get the highest number first, we need to split this string by the comma, then sort and reverse the result list:

1
{%- assign counts_with_tags = counts_with_tags_string | split:"," | sort | reverse -%}

Then we can use this list to match with our tagSlugs and tagLabels lists to print the top 3 most used tags:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<ul>
{%- for count_with_tag in counts_with_tags limit:3 -%}
    {%- assign tag = count_with_tag | split:":" | last | slugify -%}
    {%- assign tagLabel = tag -%}
    {% assign count = count_with_tag | split:":" | first | plus: 0 %}
    {%- for tagSlug in tagSlugs -%}
        {%- if tag == tagSlug -%}
            {%- assign tagLabel = tagLabels[forloop.index0] -%}
            {%- break -%}
        {%- endif -%}
    {%- endfor -%}
    <li><a href="/tags/{{ tag }}/">{{ tagLabel | strip }} <sup>{{ count }}</sup></a></li>
{%- endfor -%}
</ul>    

What is going on here:

  • We go through this list and get only the first 3 items (limit:3).
  • Get the tag’s slug by splitting up the actual element by the colon (:) and take the last part (| last).
  • For safety purpose we assign the tag slug to the tagLabel as well.
  • Get the tag’s count by splitting up the actual element by the colon (:) and take the first part (| first). We add zero (| plus 0) to convert it back to number. So the 000123 will be 123 again.
  • We go through our tagSlugs list we created earlier and match against the tag slug we currently have. When we find it we overwrite the tagLabel and quit this loop.
  • Print the link with the slug, the label and the count.

Pragmatically stop the build process

In some special cases the tag slugs and tag labels are getting out of sync because of human error. To avoid publishing a site with wrong tag links and let Google to index them, I had to find a way to stop the build process with error. The solution is quite simple: we rely on the Jekyll’s behaviour, that it evaluates the “conditions” only when the control gets there, not sooner, and not when not used. So simply add a screwed up Liquid code, like an include with invalid characters:

1
2
3
4
5
6
{%- assign tagSlugSize = tagSlugs | size -%}
{%- assign tagLabelSize = tagLabels | size -%}
{%- if tagSlugSize != tagLabelSize -%}
    {%- include ./stopBuild.html -%}
{%- endif -%}

…will result and error but only when the tagSlugs and tagLabels lists’ sizes are different:

1
2
3
Liquid Exception: Invalid syntax for include tag. File contains invalid characters or sequences: ".
/stopBuild.html" Valid syntax: {% include file.ext param='value' param2='value' %} in /app/src/_layouts/default.html

Conclusion

If we keep ourselves to the Jekyll documentation, it’s pretty nice and tidy I think. But as soon as we want something a bit more, we have to use our imagination and do some calculations. Jekyll is not supposed to use like this. But it works! Be brave, think, do experiments, browse Stackoverflow for solutions and you will expand the limits…

Gábor Iván