I’ve begun building data tools that utilize DuckDB, an in-process columnar database designed for fast analytic queries. It’s such a powerful paradigm for manipulating and recasting datasets that are small enough for you to download on your computer. Also, the logo is pretty cute.
The node.js library for DuckDB, however, is missing a number of features. While Node isn’t currently quite as “data-native” as R and Python, it’s obviously still a powerful ecosystem and application development runtime environment. A solid Node module for DuckDB means DuckDB is available and accessible for wider data tool development.
A wishlist of things I’m looking for / working on / have already fixed:
parquet by default– the parquet extension should be available to the node.js bindings by default (Hannes Mühleisen ~~merged a PR or this, thankfully!)
- promisify by default – there should be a Promise-based approach to complement the traditional Node callbacks.
- full support for all data types – I recently submitted a set of Mocha tests to provide test coverage for the data types I’ve been adding. This would be a good way to ensure that all data types are supported. Someone who’s better at C++ and the Node API will probably know how to swoop in and add generic coverage for all data types. This includes
- support for binding struct-based data types – e.g. binding an
microsfield. This isn’t the highest priority since one can just concatenate a string to produce an
- a way to interrupt any running queries - I would love a way to be able to send a signal to a query and have it stop running. I see in
interrupt.test.jsthere is use of
db.interrupt(). I don’t know the status of this functionality, but it doesn’t seem to actually stop my queries when I test it myself.
- documentation – the node.js API should be better documented. The test suite should not be the definitive source of documentation. This one should be easy enough for anyone to do.