Tuan Anh

container nerd. k8s || GTFO

Twitter Facebook GitHub GitHub

Autocomplete at speed of light

tldr: RediSearch - a full text search redis module that is super fast.

I learnt of RediSearch 2 years ago. It was a young project back then but seems very promising at time. When I revisit it last year, it’s quite mature already and was about to hit v1.0 so I did another test drive. The result was so good I put it on production few weeks later.

Its pros are:

  • Fast (well!, it’s redis).
  • No need to introduce another tech to our tech stack. You probably have Redis in your tech stack already anyway.
  • API are very well documented. (I had a junior developer working on this feature for merely 1 week)
  • If you’re using a redis client like ioredis, you don’t need to add any additional dependency. ioredis already supports it via redis.call().

Fast 🚀

On my local machine, I can easily pull 1,500 requests per second (90k RPM). I guess it could be higher since I was running RediSearch in a Docker container. I was too lazy to set it up on my host machine.

Most of the query returns in sub-60ms. You can see for yourselves on MyTour.vn


Installing RediSearch

The official Docker images are available on Docker Hub as redislabs/redisearch. We’re using Docker and Kubernetes at work so it comes in very handy.

For example, the command below spawns a RediSearch docker container for you to try locally

docker run -d -p 7000:6379 redislabs/redisearch:latest redis-server --loadmodule /usr/lib/redis/modules/redisearch.so

Using RediSearch

Using RediSearch is quite simple. You just have to create indexes and preload your data to RediSearch.

You can also specify the data type(TEXT, NUMERIC) and its weight for each of the index field.

Once the data is loaded, you can call FT.SEARCH to do the autocomplete. We find BM25 algorithm works best for our use case instead of the default TFIDF.

In our use case, the whole thing is written in ~ 200 lines of code, cache population included.

Known issue

  • RediSearch doesn’t work on case sensitive unicode characters; see issue #291. However, there’s workaround for that. You can either normalize the query or you can keep the origin data in a separate field.

2018: year in review


✨ 2018 in review ✨

👩🏻‍💼 My wife got promoted

👨‍💻 Another amazing year at work for me

🏠 We bought a house

🤑 We paid off the loan early

🎤 Gave 2 public talks (or 1. The first one is a rather small audience - 50-ish people)

We bought a house

My wife and I went for a major decision earlier this year.

We decided to buy a house with a little loan. We both agreed to live below our means and push ourselves really hard to pay it off within a year. We did it in 8 months 🔥

The fact that my wife got promoted also did help speeding it up.

Push for Kubernetes adoption at work

I pushed for Kubernetes adoption with my current company and it finally went live later in July. We switch from the classic on-demand instances over to spot instances mixed with reserved instances and Kubernetes for wordload orchestration. EC2 cost reduces more than 50 percent as the result.


I did two talks this year on cost saving optimization with Kubernetes. Actually, I wrote one keynote talk and gave it twice.

  • Kubernetes Meetup #2 in March
  • Vietnam Web Summit 2018 in December

The solution is basically Spotinst but available for small and medium size business without 30% cut from Spotinst.

Speaking plan for 2019: maybe 3 or 4 talks. Maybe “Cost saving optimization with Kubernetes and spot instances” for the last time and something new.

2018, you were good to us ❤️

Here is to an amazing year of 2019!

My keyboard layout

Over the years, I customized my keyboard layout a lot (Bootmapper client then and QMK Toolbox now) and this is what I ended up using. It’s pretty much HHKB layout with some QMK hacks on top.


CAPLOCK is useless to me so I change it to CTRL. I also set it up to use Mod-Tap key which will act as CTRL if I hold it but works as BACKSPACE if I tap it.

If I press Fn+CAPLOCK, it’s gonna be CAPLOCK as normal. Though I don’t use it, I just want it to be consistent.

This has a major benefit that I can do BACKSPACE by using my pinky finger instead of moving my right hand out of position.

Double tap SHIFT for toggling CAPLOCK

Using QMK’s tap dance feature. It’s right below the old CAPLOCK key, plus it makes sense (SHIFT and CAPLOCK).

Also, double tapping is a lot easier than FN + CTRL key which requires 2 fingers.

The SpaceFn layout

The idea of SpaceFn is you will use SPACE as your layer switching key because it’s easily accessible all the time when you’re typing. You can read more about SpaceFn here.

While it sounds cool and all, the problem araises when you type fast, the SPACE key sometimes will be registered as layer switching key. You could reduce the wait for hold delay but I could never get accustom with that. This problem is quite severe becase SPACE is frequently used.

So I ended changing this layout a bit to a what I call EscFn layout where the ESC key is the layer switching key. The ESC is now LT(1, KC_ESC): hold for layer switching and still ESC on tap. I also enable RETRO_TAPPING so that in case I hold and release ESC without pressing another key, it will send ESC anyway.

This is better because:

  • ESC is rarely used for key combo so it doesn’t affect much.
  • ESC is less frequently used when typing.
  • ESC is close and easy to allocate. Well, not as good as SPACE because we still need to move our finger a bit but it’s location is quite perfect. It’s in the corner so you can always find it without looking at the keyboard.

Also, I don’t setup the whole thing, just the arrow cluster and HOME/END keys. Even though I’m using amVim with VSCode but there are still many apps which requires arrows for navigation.

I still keep the WASD as arrow cluster for “backward compatible”.

Favorite QMK hacks

macOS media keys

macOS media keys are supported on QMK: KC__MUTE, KC__VOLUP, KC__VOLDOWN, etc…

This is essential if you’re using macOS.

Grave Escape

If you’re using a 60% keyboard, or any other layout with no F-row, you will have noticed that there is no dedicated Escape key. Grave Escape is a feature that allows you to share the grave key (` and ~) with Escape.

This is godsend if you’re using 60% keyboard.

Mod-Tap keys

The Mod-Tap key MT(mod, kc) acts like a modifier when held, and a regular keycode when tapped.

I use this to setup right shift to be TILDE on tap and RSHIFT on hold as normal.

This feature is very useful for those modifiers like CTRL, SHIFT and ALT because you probably never tap those keys.

Space Cadet Shift

Essentially, when you tap Left Shift on its own, you get an opening parenthesis; tap Right Shift on its own and you get the closing one. When held, the Shift keys function as normal. Yes, it’s as cool as it sounds.

I don’t quite understand the need for this actually. It’s cool still.

Space Cadet Shift Enter

Tap the Shift key on its own, and it behaves like Enter. When held, the Shift functions as normal.

This one kinda make sense though because it’s next to the ENTER key. To be honest, they could have use SFT_T(KC_ENTER) to achieve the similar result.

KC_RGUI and KC_LGUI do not register

Try holding Space + Backspace as you plug in the keyboard. Credit to @braidn

Fastest way to transform XML to JSON in Node.js

camaro is an utility to transform XML to JSON using a template engine powered by XPath syntax which looks like this

Here are some benchmarks I ran with the sample data I usually have to deal with (XML data ranges from 1-10MB)

camaro x 809 ops/sec ±1.51% (86 runs sampled)
rapidx2j x 204 ops/sec ±1.22% (81 runs sampled)
xml2json x 53.73 ops/sec ±0.58% (68 runs sampled)
xml2js x 40.57 ops/sec ±7.59% (56 runs sampled)
fast-xml-parser x 148 ops/sec ±3.43% (74 runs sampled)
xml-js x 33.38 ops/sec ±6.69% (60 runs sampled)
libxmljs x 127 ops/sec ±15.36% (50 runs sampled)

And the benefits of camaro is that not only it’s fast, it does the transformation for you as well. So you can just write a template and camaro will spit out the ready to use object using the schema you specified in the template.

At the time when I wrote this, there were already many XML parsers but there ain’t many which provide a way to support transformation. camaro was born from that constant need of transforming big XML files into JSON in Node.js.

I was reading a blog post from Chad Austin about the fastest JSON parser. I was working exclusively with XML at the time so I asked him what about the fastest XML parser. He replied me that problem is already solved with pugixml by Arseny Kapoulkine.

pugixml looks very good from the benchmark. The only thing I can complain about it is the lack of streaming support, which I don’t really need at the time so it’s no big deal for me.

It’s fast. It supports XPath. It’s very well-maintained.

So a few dozens line of code for the transformation to glue pugixml with node and a couple of hours later, camaro was released. Just like that.

link bài gốc

Sharding and IDs

So I was going through this post from Instagram Engineering blog while researching for some sharding solutions.

The solution is quite elegant and I decided to port this to MySQL.

Turns out, it’s harder than I thought since we’re using a pretty dated MySQL version at work. There is no sequence, just AUTO_INCREMENTAL. In order to use the code snippet for PL/PGSQL, I would have to find a way to mimic nextval function.

CREATE TABLE `sequence` (
    `name` VARCHAR(100) NOT NULL,
    `increment` INT(11) NOT NULL DEFAULT 1,
    `min_value` INT(11) NOT NULL DEFAULT 1,
    `max_value` BIGINT(20) NOT NULL DEFAULT 9223372036854775807,
    `cur_value` BIGINT(20) DEFAULT 1,
    PRIMARY KEY (`name`)

INSERT INTO sequence
    ( NAME, increment, min_value, max_value, cur_value )
    ('my_sequence', 1, 0, 100, 0);

CREATE FUNCTION `nextval` (`seq_name` VARCHAR(100))
    DECLARE cur_val BIGINT;

        cur_value INTO cur_val
        NAME = seq_name;

    IF cur_val IS NOT NULL THEN
            cur_value = IF (
                (cur_value + increment) > max_value OR (cur_value + increment) < min_value,
                IF (
                    cycle = TRUE,
                    IF (
                        (cur_value + increment) > max_value,
                cur_value + increment
            NAME = seq_name;
    END IF;
    RETURN cur_val;

So the snippet above is what we use to mimic the nextval function. Quite troublesome huh? you can now call SELECT nextval('my_sequence') if you want to get next val of the sequence .

Now, onto the generating id function. It’s a pretty straight forward port from PL/PGSQL version.

    DECLARE our_epoch BIGINT;
    DECLARE seq_id BIGINT;
    DECLARE now_millis BIGINT;
    DECLARE shard_id INT;
    DECLARE result BIGINT;

    SET our_epoch = 1314220021721;
    SET shard_id = 5;

    SELECT nextval('my_sequence') % 1024 INTO seq_id;
    SET result = (now_millis - our_epoch) << 23;
    SET result = result | (shard_id << 10);
    SET result = result | (seq_id);

    RETURN result;

In order to generate an unique id with sharding info, you can do just this select f_unique_id()

Advanced filtering and sorting with redis (part 1)

Set and sorted set are extremely powerful data types for filtering and sorting stuff with redis.

Basic filtering

Let’s start with something simple. Usually filtering is just a matter of union and intersection. Let’s say: filter all hotels that are 3 or 4 star and have both spa and pool.

For this, we just have to create a set for each of the filter criteria and do union/intersection accordingly.

Suppose we have the following data

hotel id star rating has spa has pool
1 3 yes no
2 3 yes yes
3 3 no no
4 3 no no
5 4 yes yes
6 4 no no
7 4 no no
8 4 no no

Group those item by the property you want to do filter on

sadd hotel:star:3 1 2 3 4
sadd hotel:star:4 5 6 7 8
sadd hotel:spa 1 2 5
sadd hotel:pool 2 5

As with the above example, it would be [UNION of (3,4 star sets)] INTERSECTION [ INTERSECTION of [spa, pool]]

SUNIONSTORE 3or4star hotel:star:3 hotel:star:4
SINTERSTORE spaandpool hotel:spa hotel:pool
SINTER 3or4star spaandpool
# 2 5

And you got hotel id 2 and 5 as the result.

Mutliple columns sorting

Usually, in SQL, you can do multi columns sorting like this

SELECT * FROM mytable
ORDER BY col1 ASC, col2 ASC, col3 DESC

How would you translate this logic to redis?

Actually, this is not my idea but Josiah Calrson’s (author of Redis in Action book). You can find his blog post about this and demo implementation there as well.

The basic idea is: ZINTERSTORE command supports WEIGHTS so we just have to calculate the weight for each column base on their order and sorting direction (ASC, DESC).

If you know the range of the filter criteria in advance, you can save 1 round trip to redis to fetch it.

for sort_col in sort:
    pipe.zrange(sort_col, 0, 0, withscores=True)
    pipe.zrange(sort_col, -1, -1, withscores=True)
ranges = pipe.execute()

One thing to note is that this approach doesn’t work well with non-integer values in mind. You can work around that by converting them to integer. For example, you can convert a non-integer values range from 0 to 10 with precision of 2 by multiplying the value with 100. Something like below:

function normalize(val, precision) {
    return Math.ceil(val * 10 ** precision)

Notes on GraphQL

Some personal notes while working with GraphQL

Kubernetes Meetup #2 slide

My slides from Kubernetes Meetup #2 organized by Docker Hanoi and CloudNativeVietnam.

link bài gốc

The Birth & Death of JavaScript

Old but gold.

This science fiction / comedy / absurdist / completely serious talk traces the history of JavaScript, and programming in general, from 1995 until 2035. It’s not pro- or anti-JavaScript; the language’s flaws are discussed frankly, but its ultimate impact on the industry is tremendously positive.

link bài gốc