Tuan Anh

container nerd. k8s || GTFO

Optimal team size

Recently, I saw this tweet on Twitter. This is regarding the challenge of adding more engineers to a project in relation to the number of communication lines.

This is easy to notice adding 1 more engineer to the team will increase no. of communication lines quite a bit because the function to calculate no. of communications lines in regard to no. of engineers is

O(n*n), no wonder why it’s steep. More on that later.

Amdahl’s law

This tweet immediately reminds me of Amdahl’s law.

Let’s talk broader, instead of discussing about number of communication lines, how about we try to optimize the team throughput in relation to the number of engineers on the team.

The most common misunderstanding in project management is how the project time is inversely proportional to the number of the team as you often hear this joke

A project manager is someone who thinks that 9 pregnant woman can create a baby in 1 month.

This assumes everything on the project can be done parallel. Everyone’s work is independent. This assumption is almost never true in real world project management.

So, how can we refine the above function to better reflect the real world situation. The first thing that pops in my mind is the Amdahl’s law.

Amdahl’s law is often used in parallel computing to predict the theoretical speedup when using multiple processors. For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour. Hence, the theoretical speedup is limited to at most 20 times:

Sounds like we can use this to further refine our function above. But first, this is Amdahl’s law:

Let’s call p is propotion of work that can be parallel with p=0 means nothing can be done parallelly and p=1 means everything can be done paralelly.

When p=0 (nothing can be done in parallel), we got the time = work. When p=1, time = work / n as before

Communication overhead

Now, let’s try to add communication overhead to the formula. Assuming commication cost is proportional to no. of communication lines with each of them takes a fixed cost in time, says k (assuming k is a small positive number).

Of course, in reality, communication doesn’t look like that. Maybe more like org chart. Or it could be group meeting to exchange information instead of pair-wise. This is just to say, when you want to optimize your team’s throughput, you may want to check for communication method. It’s certainly a big factor.

But for the time being, let’s just go with this.

Our function will now become


So we can see that at first, adding engineers to a project will speed it up but if go beyond a certain optimal size, adding more engineers to a project will just make it slower.

I don’t think the above function can reflect the project estimation well enough. But I know for sure, up to a certain number, adding more people will slow the project down. Maybe it’s more related to Parkinson’s law because “work expands so as to fill the time available for its completion”.

The question remains: what’s the optimal team size? Do you think it will make a big difference? Yes or no, lemme know.

From macOS to Windows 10

I’m an Apple fan no doubt but recently, I’ve been using Windows 10 exclusively at work and to be honest, I’ve grown to quite fond of Windows 10. For the first time since 2008, I’m seriously thinking about buying a Windows laptop.

With the recent introduction of Ryzen 3000 series, I was so tempted to build a PC again for the first time since 2006 (my last build was 13 years ago!!!) and install Windows 10 on it. Everything has been amazing experience since the switch. I don’t have any regret at all.

Lots of cool productivity apps

Everything (voidtools) - FREE

Amazing Spotlight replacement. I don’t even understand how can it be that fast. Just pure magic!

Ditto - FREE

For clipboard management, Windows 10 has built-in feature as well but it’s quite limited in term of feature set. Ditto is a free replacement that does the job very well.

You can set max items in history, how long you want Ditto to keep the history. Type of things you want to keep in clipboard. Changing global hotkey, etc…

Accessing clipboard history is just one shortcut away (I use Ctrl+Shift+V).

ditto windows

ShareX - FREE

Very good screenshot annotation app. I use Monosnap on macOS but ShareX is a hell lot better. I prefer Monosnap UI a bit more but ShareX trumps Monosnap in term of features.

Monosnap’s upload destination is quite limited and the app is not free.

With ShareX, I can simply press Ctrl+Shift+3, draw the screenshot rectangle, add some annotations and click upload (Imgur). Just like what I’m used to on macOS.

Microsoft Terminal

Open-source terminal application from Microsoft. It’s already quite good in preview build but what’s more is the app development velocity is insane. You can see for yourself on GitHub.

Very nice development experience

macOS is a Unix-like operating system, which make it very developer-friendly. You get the nice GUI apps from macOS and you get the best of all the CLI tools from Linux.

Now, all that stuff do-able on Windows 10 as well with the introduction of Windows SubSystem Linux (WSL).

With WSL, developers get the best of all worlds.

  • You can use all the latest drivers for your hardware (Because Windows support is still first class)
  • You get to use Microsoft Office - which still is the best office suite out there.
  • You get to develop just like you’re on an Unix machine (VSCode integration with WSL is also very good)
  • You can play games without rebooting into Windows.
  • etc…

microsoft terminal preview


I still use macOS occasionally but I’m ok with the idea of moving to Windows entirely. I’ve yet to find anything that I miss from macOS.

Tips on reducing WASM file size with Emscripten

Optimize for size over performance

If size is more important than performance, you can use -Os flag.

I tried with camaro and the file size reduce from 176KB down to 130KB. It’s worth a try.

Disable assertion, debug

Try adding these flags: -s ASSERTIONS=0 and -DNDEBUG to emcc.

Using emmaloc

Try using emmalloc which is a smaller malloc version available in emcc by adding -s 'MALLOC="emmalloc"' flag


Using --closure 1 flag may also help as well.

Remove unnecessary dependencies

Try not to import unnecessary libs in C++. Eg: don’t forget to remove iostream if you only use it for debugging with std::cout

Emscripten has a whole section about this, do read this to stay up to date.


Choosing a wireless router in 2019

I found this guide to be extremely helpful. It was written in 2018 but the basic concepts are still true.

Do read it if you want to learn what factors to consider when buying a wireless router in 2019.

Tldr; I ended up buying Netgear Orbi RBK40. Or money is not a constraint, then Netgear Orbi RBK50. Here’re why:

  • 802.11AC is a must. AX is not available yet.
  • 3 and 4 streams are still expensive for home usage. Orbi RBK40 has 2 streams and tri-band. Tri-band doesn’t increase coverage or speed but it does help make a better backhaul (eg: handling more devices + Wi-Fi system performance in general)
  • Multipoint is better in my case (house with lots of wall)
  • Dedicated backhaul (2X2 (866Mbps)). This is a major factor for Wi-Fi performance.
  • Ethernet backhaul is possible.
  • Lots of Ethernet ports. USB port also.
  • 512 MB RAM 😱
  • Mesh because it’s extendable. I can just buy more satellite if I move to a bigger house.
  • Phenomenon performance in reviews.

Some lessons learnt after converting a native module to WebAssembly

I released my first WebAssembly module here on GitHub. Here are some lessons I learnt during the process. Please don’t take it for granted. These things might be true, or not. I’m not sure. I just worked on it for the last few days.

Cache WebAssembly instance to save some time on initialization

const Module = require('yourmodule')

const mod = Module()
const resolveCache = new Map()

mod.onRuntimeInitialized = function() {
    resolveCache.set(resolvePath, mod)
    // mod.your_method()

Size of WebAssembly files matter

I accidentally publish an alpha build with iostream imported (I was debugging in C++ with std::cout !!!!), which result in {.js,.wasm} files significantly bigger. Bigger files mean longer time to download and compile.

A small fix to remove #include <iostream> increases performance of camaro (wasm) from ~200 ops/second to ~260 ops/second. Still far from the the old camaro performance (~800 ops/second) but it’s a start.

To me, WebAssembly is a huge win. The performance hit is still big but I think it’s just me doing something wrong haha. It’s my first WebAssembly module so there’s a lot to learn from here. Aside from that, camaro (wasm) is a big improvement in this version such as:

  • Smaller footprint, zero dependencies.
  • Closing a bunch of bug reports related to module installation on GitHub.
  • Serverless (eg: AWS Lambda) friendly as compilation is no longer required. No more downloading prebuilt artifacts from GitHub as well.

I still don’t know how to return arbitrary shape object from C++ to JS land

I still don’t know how to do this. For native module, it’s easier with node-addon-api. I’m still trying to figure this out.

This make camaro slows down a bit because I stringify the JSON object in C++ and parse again in JavaScript (😱 horrifying, I know!)


How to delete Redis keys using pattern

So Googling this question show a common approach using KEYS command and then pipe them to DEL command, which looks something like this in Lua script

local keys = redis.call('keys', ARGV[1])
for i=1,#keys,5000 do
    redis.call('del', unpack(keys, i, math.min(i+4999, #keys)))
return keys

Doing things like this would be fine for development where the keyspace is small and frequent of use is low.

However, if you were to do this at a larger keyspace and a lot more frequent, the KEYS command will block Redis instance and making other commands to fail.

The alternative is to use SCAN and loop until the cursor is 0. A naive implementation is like below.

local matches = redis.call('scan', '0', 'match', ARGV[1], 'count', '10000')

while matches[1] ~= "0" do
    for i=1,#matches[2],5000 do
        redis.call('del', unpack(matches[2], i, math.min(i+4999, #matches[2])))

    matches = redis.call('scan', matches[1])
return matches[2]

Autocomplete at speed of light

tldr: RediSearch - a full text search redis module that is super fast.

I learnt of RediSearch 2 years ago. It was a young project back then but seems very promising at time. When I revisit it last year, it’s quite mature already and was about to hit v1.0 so I did another test drive. The result was so good I put it on production few weeks later.

Its pros are:

  • Fast (well!, it’s redis).
  • No need to introduce another tech to our tech stack. You probably have Redis in your tech stack already anyway.
  • API are very well documented. (I had a junior developer working on this feature for merely 1 week)
  • If you’re using a redis client like ioredis, you don’t need to add any additional dependency. ioredis already supports it via redis.call().

Fast 🚀

On my local machine, I can easily pull 1,500 requests per second (90k RPM). I guess it could be higher since I was running RediSearch in a Docker container. I was too lazy to set it up on my host machine.

Most of the query returns in sub-60ms. You can see for yourselves on MyTour.vn


Installing RediSearch

The official Docker images are available on Docker Hub as redislabs/redisearch. We’re using Docker and Kubernetes at work so it comes in very handy.

For example, the command below spawns a RediSearch docker container for you to try locally

docker run -d -p 7000:6379 redislabs/redisearch:latest redis-server --loadmodule /usr/lib/redis/modules/redisearch.so

Using RediSearch

Using RediSearch is quite simple. You just have to create indexes and preload your data to RediSearch.

You can also specify the data type(TEXT, NUMERIC) and its weight for each of the index field.

Once the data is loaded, you can call FT.SEARCH to do the autocomplete. We find BM25 algorithm works best for our use case instead of the default TFIDF.

In our use case, the whole thing is written in ~ 200 lines of code, cache population included.

Known issue

  • RediSearch doesn’t work on case sensitive unicode characters; see issue #291. However, there’s workaround for that. You can either normalize the query or you can keep the origin data in a separate field.

2018: year in review


✨ 2018 in review ✨

👩🏻‍💼 My wife got promoted

👨‍💻 Another amazing year at work for me

🏠 We bought a house

🤑 We paid off the loan early

🎤 Gave 2 public talks (or 1. The first one is a rather small audience - 50-ish people)

We bought a house

My wife and I went for a major decision earlier this year.

We decided to buy a house with a little loan. We both agreed to live below our means and push ourselves really hard to pay it off within a year. We did it in 8 months 🔥

The fact that my wife got promoted also did help speeding it up.

Push for Kubernetes adoption at work

I pushed for Kubernetes adoption with my current company and it finally went live later in July. We switch from the classic on-demand instances over to spot instances mixed with reserved instances and Kubernetes for wordload orchestration. EC2 cost reduces more than 50 percent as the result.


I did two talks this year on cost saving optimization with Kubernetes. Actually, I wrote one keynote talk and gave it twice.

  • Kubernetes Meetup #2 in March
  • Vietnam Web Summit 2018 in December

The solution is basically Spotinst but available for small and medium size business without 30% cut from Spotinst.

Speaking plan for 2019: maybe 3 or 4 talks. Maybe “Cost saving optimization with Kubernetes and spot instances” for the last time and something new.

2018, you were good to us ❤️

Here is to an amazing year of 2019!

My keyboard layout

Over the years, I customized my keyboard layout a lot (Bootmapper client then and QMK Toolbox now) and this is what I ended up using. It’s pretty much HHKB layout with some QMK hacks on top.


CAPLOCK is useless to me so I change it to CTRL. I also set it up to use Mod-Tap key which will act as CTRL if I hold it but works as BACKSPACE if I tap it.

If I press Fn+CAPLOCK, it’s gonna be CAPLOCK as normal. Though I don’t use it, I just want it to be consistent.

This has a major benefit that I can do BACKSPACE by using my pinky finger instead of moving my right hand out of position.

Double tap SHIFT for toggling CAPLOCK

Using QMK’s tap dance feature. It’s right below the old CAPLOCK key, plus it makes sense (SHIFT and CAPLOCK).

Also, double tapping is a lot easier than FN + CTRL key which requires 2 fingers.

The SpaceFn layout

The idea of SpaceFn is you will use SPACE as your layer switching key because it’s easily accessible all the time when you’re typing. You can read more about SpaceFn here.

While it sounds cool and all, the problem araises when you type fast, the SPACE key sometimes will be registered as layer switching key. You could reduce the wait for hold delay but I could never get accustom with that. This problem is quite severe becase SPACE is frequently used.

So I ended changing this layout a bit to a what I call EscFn layout where the ESC key is the layer switching key. The ESC is now LT(1, KC_ESC): hold for layer switching and still ESC on tap. I also enable RETRO_TAPPING so that in case I hold and release ESC without pressing another key, it will send ESC anyway.

This is better because:

  • ESC is rarely used for key combo so it doesn’t affect much.
  • ESC is less frequently used when typing.
  • ESC is close and easy to allocate. Well, not as good as SPACE because we still need to move our finger a bit but it’s location is quite perfect. It’s in the corner so you can always find it without looking at the keyboard.

Also, I don’t setup the whole thing, just the arrow cluster and HOME/END keys. Even though I’m using amVim with VSCode but there are still many apps which requires arrows for navigation.

I still keep the WASD as arrow cluster for “backward compatible”.

Favorite QMK hacks

macOS media keys

macOS media keys are supported on QMK: KC__MUTE, KC__VOLUP, KC__VOLDOWN, etc…

This is essential if you’re using macOS.

Grave Escape

If you’re using a 60% keyboard, or any other layout with no F-row, you will have noticed that there is no dedicated Escape key. Grave Escape is a feature that allows you to share the grave key (` and ~) with Escape.

This is godsend if you’re using 60% keyboard.

Mod-Tap keys

The Mod-Tap key MT(mod, kc) acts like a modifier when held, and a regular keycode when tapped.

I use this to setup right shift to be TILDE on tap and RSHIFT on hold as normal.

This feature is very useful for those modifiers like CTRL, SHIFT and ALT because you probably never tap those keys.

Space Cadet Shift

Essentially, when you tap Left Shift on its own, you get an opening parenthesis; tap Right Shift on its own and you get the closing one. When held, the Shift keys function as normal. Yes, it’s as cool as it sounds.

I don’t quite understand the need for this actually. It’s cool still.

Space Cadet Shift Enter

Tap the Shift key on its own, and it behaves like Enter. When held, the Shift functions as normal.

This one kinda make sense though because it’s next to the ENTER key. To be honest, they could have use SFT_T(KC_ENTER) to achieve the similar result.

KC_RGUI and KC_LGUI do not register

Try holding Space + Backspace as you plug in the keyboard. Credit to @braidn