Tuan Anh

container nerd. k8s || GTFO

Twitter Facebook GitHub GitHub RSS Feed

Advanced filtering and sorting with redis (part 2)

With the recent introduction of Redis modules (since Redis v4), redis is now a lot more flexible than the old redis.

Previously in part 1, if you want to mimic the sorting and filtering behavior, you need to use set/sorted set and do the intersection/union by yourself.

Not anymore.

Meet RediSQL

RediSQL is an in-memory SQL engine, built on top of Redis as Redis module.

It’s pretty much SQL under the hood now. No more smart trick to mimic the behavior.

The downside is there ain’t many redis client that support browsing data for these modules, aside from the newly released RedisInsight, which currently only support RedisGraph, RediSearch and RedisTimeSeries. This makes debugging is really troublesome. This is a big show stopper for me. Just something for you to keep in mind.

RedisGraph

RedisGraph is a graph database module for Redis. It’s specificly built for graph database but can be utilized for doing filtering as well, because it’s a graph database. It’s kinda using the wrong tool for the purpose. RedisGraph is a lot more powerful than just doing filtering and sorting.

Example of doing filtering in RedisGraph

Loading data

GRAPH.QUERY TestGraph "CREATE (:Property {id: '1', name: 'hotel 1'})-[:hasFacility]->(:Facility {id: '1', name: 'Swimming pool'})"
GRAPH.QUERY TestGraph "CREATE (:Property {id: '1', name: 'hotel 1'})-[:inCity]->(:City {id: '1', name: 'Hanoi'})"
GRAPH.QUERY TestGraph "CREATE (:Property {id: '1', name: 'hotel 1'})-[:hasStarRating]->(:Rating {id: '4', name: '4 star'})"

GRAPH.QUERY TestGraph "CREATE (:Property {id: '2', name: 'hotel 2'})-[:hasFacility]->(:Facility {id: '2', name: 'Spa'})"
GRAPH.QUERY TestGraph "CREATE (:Property {id: '2', name: 'hotel 2'})-[:inCity]->(:City {id: '1', name: 'Hanoi'})"
GRAPH.QUERY TestGraph "CREATE (:Property {id: '2', name: 'hotel 2'})-[:hasStarRating]->(:Rating {id: '3', name: '3 star'})"

Filter all 3 star hotels in Hanoi

GRAPH.QUERY TestGraph "MATCH (h:Property)-[:inCity]->(c:City) WHERE c.name = 'Hanoi' and r.name='3 star' RETURN h.id, h.name"

Pi-hole

I’ve heard a lot of praise about Pi-hole project but hadn’t gotten around actually trying it yet until recently.

Pi-hole is a network-wide ad-blocking solution via local DNS. You set it up as a local DNS server and it will block all the ads that match the rule from DNS level. This way, you don’t have to setup adblock on each and every devices you have, especially tablets and mobiles.

People usually use it with a low powered device like Raspberry Pi (hence the name Pi-hole) but in my case, I already have an Intel NUC around as Plex server (running Windows 10). I could just use it instead of setting up sth new.

The easiest way to install Pi-hole is to use Docker. The process is as easy as

  • Install Docker for Desktop.
  • Create a few folders for pihole config. Let’s do it in Documents folder and mount it to container. If you change it to something else, make sure to update the following commands. Create the 3 folders with the folowing structure.
pi-hole-config/
├── dnsmasq.d/
├── pihole/
  • Download and run pihole Docker container with the following command
docker run -d --name pihole \
    -p 53:53/tcp \
    -p 53:53/udp \
    -p 80:80 \
    -p 443:443 \
    -v "/c/Users/<USESRNAME>/Documents/pi-hole-config/pihole/:/etc/pihole/" \
    -v "/c/Users/<USESRNAME>/Documents/pi-hole-config/dnsmasq.d/:/etc/dnsmasq.d/" \
    -e ServerIP="<YOUR_HOST_IP>" \
    --dns=127.0.0.1 \
    --dns=1.1.1.1 \
    -e WEBPASSWORD=<PASSWD> \
    --restart=unless-stopped pihole/pihole:latest
  • I need to disable Windows Firewall for local network which I think it’s safe to do at home in order to access the container from other machine.

And that’s it. Now you can head over to HOST_IP/admin to login and configure Pi-hole. Once it’s done, you can config your router to use the HOST_IP as the default DNS server, and maybe Cloudflare or Google’s DNS as backup. Ah, and make sure to set static IP for your Pi-hole host machine so you don’t have to update router’s setting if the IP changes.

Keep in mind that Pihole is not a complete replacement for browser extension like uBlock origin. You probably still needs that because DNS-based adblocking functionality is quite limited. It’s mostly useful for mobile browser where adblocking is almost non-existent or not good enough.

Microsoft Sculpt Ergonomic Desktop review

While i’m a hardcore mechanical keyboard fan, this is my very fisrt split (partially), ergonomic keyboard.

The keyboard comes in tenkeyless size and a separated numpad. It’s a bit bigger for my taste but I can live with that. I’m more accustomed to a smaller board. 60%-65% is usually the sweet spot for me.

The sculpt keyboard uses wireless but probably not Bluetooth which makes me wonder if I can find the replacement USB receiver dongle if I ever lost it.

The keyboard

Decent looks. Not like Apple-good looking but quite good. It’s also look nicer in picture than in real life so keep that in mind

Pretty standard layout with non-standard key size, with several keys of unusual size. This kind of split keyboard will help you realize if you’re touch typing wrong. For example, i sometimes type Y with the left index finger. While its possible to do that on a normal keyboard, it’s not do-able on split keyboard in general because it’s now too far from the left index finger. I think of this as a good thing.

I also suffer a short period of inaccuracy with the B key, although the size is same with other keys, it feels like the hit box is much smaller. Same thing with the Enter key.

I’m also not a big fan of the ESC key and the whole Fn row. They’re small and quite wobble.

Coming from a big fan of mechanical keyboard, scissor switch is kind of meh. It’s better than a typical laptop keyboard but so much worse than a mechanical keyboard. The typing feeling can only be as good as scissor goes.

Media keys are nice. They work out of the box with apps like Spotify. There’s a button at the top right corner to switch back and forth between Fn function keys and media keys.

The biggest downside of this keyboard is that remapping keys (via firmware) is not possible.

One of the thing I would like to do is to map the left spacebar to backspace. It’s possible with the Sculpt Comfort but not possible with Sculpt Ergonomic. You can only blame Microsoft for this. The issue has been there for a few years so I dont think they have any plan to fix this.

From the documentation on Microsoft website, it seems remapping is a supported feature but it’s super misleading. There are only 9 keys that can be customized and the available mapping options are quite limited.

You may think the Caplocks can be customized from the above picture? Wrong. You can only turn it on or off and that’s it.

Good thing is you can still do it via software. There are apps like AutoHotkey where you can program it like how you like. But you will have to set it up again if you use the keyboard on another machine.

Personally, I find TouchCursor good enough for my needs. The software mostly focuses on SpaceFn layout but you can config it to use any key as the Fn key.

Also, you don’t need Microsoft Mouse and Keyboard Center software. It’s garbage.

The mouse

Let’s talk about the mouse. It’s goofy and make you feel kind of strange to hold it at first. It feels like you’re holding a ball. Overall, it’s a lot more comfortable than the coventional mouse with office work but I wouldn’t use it for gaming though.

The mouse back button requires a bit of force which I think they can improve in newer version. Also, lowering the button a little bit would be nice.

Conclusion

Even though I own lots of mechanical keyboards but I don’t use them as often as this keyboard. This thing is slowly becoming my personal favorite.

I would def. recommend if you don’t need the some serious remapping feature (Check TouchCursor out to see if it’s working well enough for you) or play games a lot.

Optimal team size

Recently, I saw this tweet on Twitter. This is regarding the challenge of adding more engineers to a project in relation to the number of communication lines.

This is easy to notice adding 1 more engineer to the team will increase no. of communication lines quite a bit because the function to calculate no. of communications lines in regard to no. of engineers is

O(n*n), no wonder why it’s steep. More on that later.

Amdahl’s law

This tweet immediately reminds me of Amdahl’s law.

Let’s talk broader, instead of discussing about number of communication lines, how about we try to optimize the team throughput in relation to the number of engineers on the team.

The most common misunderstanding in project management is how the project time is inversely proportional to the number of the team as you often hear this joke

A project manager is someone who thinks that 9 pregnant woman can create a baby in 1 month.

This assumes everything on the project can be done parallel. Everyone’s work is independent. This assumption is almost never true in real world project management.

So, how can we refine the above function to better reflect the real world situation. The first thing that pops in my mind is the Amdahl’s law.

Amdahl’s law is often used in parallel computing to predict the theoretical speedup when using multiple processors. For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour. Hence, the theoretical speedup is limited to at most 20 times:

Sounds like we can use this to further refine our function above. But first, this is Amdahl’s law:

Let’s call p is propotion of work that can be parallel with p=0 means nothing can be done parallelly and p=1 means everything can be done paralelly.

When p=0 (nothing can be done in parallel), we got the time = work. When p=1, time = work / n as before

Communication overhead

Now, let’s try to add communication overhead to the formula. Assuming commication cost is proportional to no. of communication lines with each of them takes a fixed cost in time, says k (assuming k is a small positive number).

Of course, in reality, communication doesn’t look like that. Maybe more like org chart. Or it could be group meeting to exchange information instead of pair-wise. This is just to say, when you want to optimize your team’s throughput, you may want to check for communication method. It’s certainly a big factor.

But for the time being, let’s just go with this.

Our function will now become

Conclusion

So we can see that at first, adding engineers to a project will speed it up but if go beyond a certain optimal size, adding more engineers to a project will just make it slower.

I don’t think the above function can reflect the project estimation well enough. But I know for sure, up to a certain number, adding more people will slow the project down. Maybe it’s more related to Parkinson’s law because “work expands so as to fill the time available for its completion”.

The question remains: what’s the optimal team size? Do you think it will make a big difference? Yes or no, lemme know.

From macOS to Windows 10

I’m an Apple fan no doubt but recently, I’ve been using Windows 10 exclusively at work and to be honest, I’ve grown to quite fond of Windows 10. For the first time since 2008, I’m seriously thinking about buying a Windows laptop.

With the recent introduction of Ryzen 3000 series, I was so tempted to build a PC again for the first time since 2006 (my last build was 13 years ago!!!) and install Windows 10 on it. Everything has been amazing experience since the switch. I don’t have any regret at all.

Lots of cool productivity apps

Everything (voidtools) - FREE

Amazing Spotlight replacement. I don’t even understand how can it be that fast. Just pure magic!

Ditto - FREE

For clipboard management, Windows 10 has built-in feature as well but it’s quite limited in term of feature set. Ditto is a free replacement that does the job very well.

You can set max items in history, how long you want Ditto to keep the history. Type of things you want to keep in clipboard. Changing global hotkey, etc…

Accessing clipboard history is just one shortcut away (I use Ctrl+Shift+V).

ditto windows

ShareX - FREE

Very good screenshot annotation app. I use Monosnap on macOS but ShareX is a hell lot better. I prefer Monosnap UI a bit more but ShareX trumps Monosnap in term of features.

Monosnap’s upload destination is quite limited and the app is not free.

With ShareX, I can simply press Ctrl+Shift+3, draw the screenshot rectangle, add some annotations and click upload (Imgur). Just like what I’m used to on macOS.

Microsoft Terminal

Open-source terminal application from Microsoft. It’s already quite good in preview build but what’s more is the app development velocity is insane. You can see for yourself on GitHub.

Very nice development experience

macOS is a Unix-like operating system, which make it very developer-friendly. You get the nice GUI apps from macOS and you get the best of all the CLI tools from Linux.

Now, all that stuff do-able on Windows 10 as well with the introduction of Windows SubSystem Linux (WSL).

With WSL, developers get the best of all worlds.

  • You can use all the latest drivers for your hardware (Because Windows support is still first class)
  • You get to use Microsoft Office - which still is the best office suite out there.
  • You get to develop just like you’re on an Unix machine (VSCode integration with WSL is also very good)
  • You can play games without rebooting into Windows.
  • etc…

microsoft terminal preview

Conclusion

I still use macOS occasionally but I’m ok with the idea of moving to Windows entirely. I’ve yet to find anything that I miss from macOS.

Tips on reducing WASM file size with Emscripten

Optimize for size over performance

If size is more important than performance, you can use -Os flag.

I tried with camaro and the file size reduce from 176KB down to 130KB. It’s worth a try.

Disable assertion, debug

Try adding these flags: -s ASSERTIONS=0 and -DNDEBUG to emcc.

Using emmaloc

Try using emmalloc which is a smaller malloc version available in emcc by adding -s 'MALLOC="emmalloc"' flag

Closure

Using --closure 1 flag may also help as well.

Remove unnecessary dependencies

Try not to import unnecessary libs in C++. Eg: don’t forget to remove iostream if you only use it for debugging with std::cout

Emscripten has a whole section about this, do read this to stay up to date.

https://emscripten.org/docs/optimizing/Optimizing-Code.html#code-size

Choosing a wireless router in 2019

I found this guide to be extremely helpful. It was written in 2018 but the basic concepts are still true.

Do read it if you want to learn what factors to consider when buying a wireless router in 2019.

Tldr; I ended up buying Netgear Orbi RBK40. Or money is not a constraint, then Netgear Orbi RBK50. Here’re why:

  • 802.11AC is a must. AX is not available yet.
  • 3 and 4 streams are still expensive for home usage. Orbi RBK40 has 2 streams and tri-band. Tri-band doesn’t increase coverage or speed but it does help make a better backhaul (eg: handling more devices + Wi-Fi system performance in general)
  • Multipoint is better in my case (house with lots of wall)
  • Dedicated backhaul (2X2 (866Mbps)). This is a major factor for Wi-Fi performance.
  • Ethernet backhaul is possible.
  • Lots of Ethernet ports. USB port also.
  • 512 MB RAM 😱
  • Mesh because it’s extendable. I can just buy more satellite if I move to a bigger house.
  • Phenomenon performance in reviews.

Some lessons learnt after converting a native module to WebAssembly

I released my first WebAssembly module here on GitHub. Here are some lessons I learnt during the process. Please don’t take it for granted. These things might be true, or not. I’m not sure. I just worked on it for the last few days.

Cache WebAssembly instance to save some time on initialization

const Module = require('yourmodule')

const mod = Module()
const resolveCache = new Map()

mod.onRuntimeInitialized = function() {
    resolveCache.set(resolvePath, mod)
    // mod.your_method()
}

Size of WebAssembly files matter

I accidentally publish an alpha build with iostream imported (I was debugging in C++ with std::cout !!!!), which result in {.js,.wasm} files significantly bigger. Bigger files mean longer time to download and compile.

A small fix to remove #include <iostream> increases performance of camaro (wasm) from ~200 ops/second to ~260 ops/second. Still far from the the old camaro performance (~800 ops/second) but it’s a start.

To me, WebAssembly is a huge win. The performance hit is still big but I think it’s just me doing something wrong haha. It’s my first WebAssembly module so there’s a lot to learn from here. Aside from that, camaro (wasm) is a big improvement in this version such as:

  • Smaller footprint, zero dependencies.
  • Closing a bunch of bug reports related to module installation on GitHub.
  • Serverless (eg: AWS Lambda) friendly as compilation is no longer required. No more downloading prebuilt artifacts from GitHub as well.

I still don’t know how to return arbitrary shape object from C++ to JS land

I still don’t know how to do this. For native module, it’s easier with node-addon-api. I’m still trying to figure this out.

This make camaro slows down a bit because I stringify the JSON object in C++ and parse again in JavaScript (😱 horrifying, I know!)

References:

How to delete Redis keys using pattern

So Googling this question show a common approach using KEYS command and then pipe them to DEL command, which looks something like this in Lua script

local keys = redis.call('keys', ARGV[1])
for i=1,#keys,5000 do
    redis.call('del', unpack(keys, i, math.min(i+4999, #keys)))
end
return keys

Doing things like this would be fine for development where the keyspace is small and frequent of use is low.

However, if you were to do this at a larger keyspace and a lot more frequent, the KEYS command will block Redis instance and making other commands to fail.

The alternative is to use SCAN and loop until the cursor is 0. A naive implementation is like below.

redis.replicate_commands()
local matches = redis.call('scan', '0', 'match', ARGV[1], 'count', '10000')

while matches[1] ~= "0" do
    for i=1,#matches[2],5000 do
        redis.call('del', unpack(matches[2], i, math.min(i+4999, #matches[2])))
    end

    matches = redis.call('scan', matches[1])
end
return matches[2]

Autocomplete at speed of light

tldr: RediSearch - a full text search redis module that is super fast.

I learnt of RediSearch 2 years ago. It was a young project back then but seems very promising at time. When I revisit it last year, it’s quite mature already and was about to hit v1.0 so I did another test drive. The result was so good I put it on production few weeks later.

Its pros are:

  • Fast (well!, it’s redis).
  • No need to introduce another tech to our tech stack. You probably have Redis in your tech stack already anyway.
  • API are very well documented. (I had a junior developer working on this feature for merely 1 week)
  • If you’re using a redis client like ioredis, you don’t need to add any additional dependency. ioredis already supports it via redis.call().

Fast 🚀

On my local machine, I can easily pull 1,500 requests per second (90k RPM). I guess it could be higher since I was running RediSearch in a Docker container. I was too lazy to set it up on my host machine.

Most of the query returns in sub-60ms. You can see for yourselves on MyTour.vn

redisearch

Installing RediSearch

The official Docker images are available on Docker Hub as redislabs/redisearch. We’re using Docker and Kubernetes at work so it comes in very handy.

For example, the command below spawns a RediSearch docker container for you to try locally

docker run -d -p 7000:6379 redislabs/redisearch:latest redis-server --loadmodule /usr/lib/redis/modules/redisearch.so

Using RediSearch

Using RediSearch is quite simple. You just have to create indexes and preload your data to RediSearch.

You can also specify the data type(TEXT, NUMERIC) and its weight for each of the index field.

Once the data is loaded, you can call FT.SEARCH to do the autocomplete. We find BM25 algorithm works best for our use case instead of the default TFIDF.

In our use case, the whole thing is written in ~ 200 lines of code, cache population included.

Known issue

  • RediSearch doesn’t work on case sensitive unicode characters; see issue #291. However, there’s workaround for that. You can either normalize the query or you can keep the origin data in a separate field.