Tuan Anh

container nerd. k8s || GTFO

Twitter Facebook GitHub

Fastest way to transform XML to JSON in Node.js

camaro is an utility to transform XML to JSON using a template engine powered by XPath syntax which looks like this

Here are some benchmarks I ran with the sample data I usually have to deal with (XML data ranges from 1-10MB)

camaro x 809 ops/sec ±1.51% (86 runs sampled)
rapidx2j x 204 ops/sec ±1.22% (81 runs sampled)
xml2json x 53.73 ops/sec ±0.58% (68 runs sampled)
xml2js x 40.57 ops/sec ±7.59% (56 runs sampled)
fast-xml-parser x 148 ops/sec ±3.43% (74 runs sampled)
xml-js x 33.38 ops/sec ±6.69% (60 runs sampled)
libxmljs x 127 ops/sec ±15.36% (50 runs sampled)

And the benefits of camaro is that not only it’s fast, it does the transformation for you as well. So you can just write a template and camaro will spit out the ready to use object using the schema you specified in the template.

At the time when I wrote this, there were already many XML parsers but there ain’t many which provide a way to support transformation. camaro was born from that constant need of transforming big XML files into JSON in Node.js.

I was reading a blog post from Chad Austin about the fastest JSON parser. I was working exclusively with XML at the time so I asked him what about the fastest XML parser. He replied me that problem is already solved with pugixml by Arseny Kapoulkine.

pugixml looks very good from the benchmark. The only thing I can complain about it is the lack of streaming support, which I don’t really need at the time so it’s no big deal for me.

It’s fast. It supports XPath. It’s very well-maintained.

So a few dozens line of code for the transformation to glue pugixml with node and a couple of hours later, camaro was released. Just like that.

link bài gốc

Sharding and IDs

So I was going through this post from Instagram Engineering blog while researching for some sharding solutions.

The solution is quite elegant and I decided to port this to MySQL.

Turns out, it’s harder than I thought since we’re using a pretty dated MySQL version at work. There is no sequence, just AUTO_INCREMENTAL. In order to use the code snippet for PL/PGSQL, I would have to find a way to mimic nextval function.

CREATE TABLE `sequence` (
    `name` VARCHAR(100) NOT NULL,
    `increment` INT(11) NOT NULL DEFAULT 1,
    `min_value` INT(11) NOT NULL DEFAULT 1,
    `max_value` BIGINT(20) NOT NULL DEFAULT 9223372036854775807,
    `cur_value` BIGINT(20) DEFAULT 1,
    `cycle` BOOLEAN NOT NULL DEFAULT FALSE,
    PRIMARY KEY (`name`)
) ENGINE=MyISAM;

INSERT INTO sequence
    ( NAME, increment, min_value, max_value, cur_value )
VALUES
    ('my_sequence', 1, 0, 100, 0);


DROP FUNCTION IF EXISTS nextval;
DELIMITER $$
CREATE FUNCTION `nextval` (`seq_name` VARCHAR(100))
RETURNS BIGINT NOT DETERMINISTIC
BEGIN
    DECLARE cur_val BIGINT;

    SELECT
        cur_value INTO cur_val
    FROM
        sequence
    WHERE
        NAME = seq_name;

    IF cur_val IS NOT NULL THEN
        UPDATE
            sequence
        SET
            cur_value = IF (
                (cur_value + increment) > max_value OR (cur_value + increment) < min_value,
                IF (
                    cycle = TRUE,
                    IF (
                        (cur_value + increment) > max_value,
                        min_value,
                        max_value
                    ),
                    NULL
                ),
                cur_value + increment
            )
        WHERE
            NAME = seq_name;
    END IF;
    RETURN cur_val;
END;
$$

So the snippet above is what we use to mimic the nextval function. Quite troublesome huh? you can now call SELECT nextval('my_sequence') if you want to get next val of the sequence .

Now, onto the generating id function. It’s a pretty straight forward port from PL/PGSQL version.

DROP FUNCTION IF EXISTS f_unique_id;
DELIMITER $$
CREATE FUNCTION f_unique_id() RETURNS BIGINT
BEGIN
    DECLARE our_epoch BIGINT;
    DECLARE seq_id BIGINT;
    DECLARE now_millis BIGINT;
    DECLARE shard_id INT;
    DECLARE result BIGINT;

    SET our_epoch = 1314220021721;
    SET shard_id = 5;

    SELECT nextval('my_sequence') % 1024 INTO seq_id;
    SELECT UNIX_TIMESTAMP() INTO now_millis;
    SET result = (now_millis - our_epoch) << 23;
    SET result = result | (shard_id << 10);
    SET result = result | (seq_id);

    RETURN result;
END;
$$
DELIMITER ;

In order to generate an unique id with sharding info, you can do just this select f_unique_id()

Advanced filtering and sorting with redis (part 1)

Set and sorted set are extremely powerful data types for filtering and sorting stuff with redis.

Basic filtering

Let’s start with something simple. Usually filtering is just a matter of union and intersection. Let’s say: filter all hotels that are 3 or 4 star and have both spa and pool.

For this, we just have to create a set for each of the filter criteria and do union/intersection accordingly.

sadd hotel:star:3 1 2 3 4
sadd hotel:star:4 5 6 7 8
sadd hotel:spa 1 2 5
sadd hotel:pool 2 5

As with the above example, it would be [UNION of (3,4 star sets)] INTERSECTION [ INTERSECTION of [spa, pool]]

SUNIONSTORE 3or4star hotel:star:3 hotel:star:4
SINTERSTORE spaandpool hotel:spa hotel:pool
SINTER 3or4star spaandpool
# 2 5

And you got hotel id 2 and 5 as the result.

Mutliple columns sorting

Usually, in SQL, you can do multi columns sorting like this

SELECT * FROM mytable
ORDER BY col1 ASC, col2 ASC, col3 DESC

How would you translate this logic to redis?

Actually, this is not my idea but Josiah Calrson’s (author of Redis in Action book). You can find his blog post about this and demo implementation there as well.

The basic idea is: ZINTERSTORE command supports WEIGHTS so we just have to calculate the weight for each column base on their order and sorting direction (ASC, DESC).

If you know the range of the filter criteria in advance, you can save 1 round trip to redis to fetch it.

for sort_col in sort:
    pipe.zrange(sort_col, 0, 0, withscores=True)
    pipe.zrange(sort_col, -1, -1, withscores=True)
ranges = pipe.execute()

Notes on GraphQL

Some personal notes while working with GraphQL

Kubernetes Meetup #2 slide

My slides from Kubernetes Meetup #2 organized by Docker Hanoi and CloudNativeVietnam.

link bài gốc

The Birth & Death of JavaScript

Old but gold.

This science fiction / comedy / absurdist / completely serious talk traces the history of JavaScript, and programming in general, from 1995 until 2035. It’s not pro- or anti-JavaScript; the language’s flaws are discussed frankly, but its ultimate impact on the industry is tremendously positive.

link bài gốc

Advice to new managers

Advice to new managers:

  • Earn trust by giving it
  • Inspire, don’t tell
  • Eat lunch with your team
  • Show their work matters
  • Be a player-coach
  • Feedback in private, praise in public
  • In victory, lead from back
  • In crisis, lead from front
  • Be the manager you wish you had

via Twitter

DejaLu - a new open source email client by Sparrow's author

I downloaded the beta and gave it a try.

At this stage, it’s already a better email client than Airmail.

Some notes:

  • Amazing startup speed. Why can’t all apps be like this?
  • Nice and clean UI. Community themes could be a great feature to have but I don’t mind not having it.
  • I specially like the conversation list view. It’s just so clean.
link bài gốc

Series and parallel execution with async/await

TIL

series

async function series() {
  await wait(500); // Wait 500ms…
  await wait(500); // …then wait another 500ms.
  return "done!";
}

parallel

async function parallel() {
  const wait1 = wait(500); // Start a 500ms timer asynchronously…
  const wait2 = wait(500); // …meaning this timer happens in parallel.
  await wait1; // Wait 500ms for the first timer…
  await wait2; // …by which time this timer has already finished.
  return "done!";
}
link bài gốc

2017: year in review

Best year at work yet!

  • I worked on a project (with a several members of my team) to migrate our apps onto Kubernetes cluster since the beginning of 2017. We’ve been using Kubernetes in production since.
  • Convince and guide other teams to follow our initiative to migrate to Kubernetes.
  • 💸 Significantly reduce our AWS bills with the use of spot instances / spot fleet while maintaining high availability of the system.
  • ✌️ Got a new job!! Employer gave a counter offer matching compensation but I decided it’s time to move on.

Overall, I’ve set a concrete infrastructure for my company to move forward. I believe my team can step up and continue working on current projects.

My one and only

Being a dad is overwhelming but certainly a great experience. Sleep depreviation surely sucks but any bad feelings seems to disappear when those little tiny hands hold my face and give me quick kiss on the cheek.

More books

I will not force myself to finish the book I’m not enjoying or learning from. I did this rigorously last year just for the sake of finishing the books.

I also want to learn speed-read this year.

Average readers read at speeds of around 250 words per minute with a typical comprehension of 60%. Imagine if you can read at 500 wpm, you can read twice as many books. It’s truly an amazing skill to have.

Talks

I didn’t give any talks last year. I would love to do 1 or 2 this year. Let’s make it happen.

2018 is gonna be a great year!