Hamilton Ulmer

Tools for Data Analysis & Deep Work

small improvements to the duckdb node module
Dec 21, 2021

I’ve begun building data tools that utilize DuckDB, an in-process columnar database designed for fast analytic queries. It’s such a powerful paradigm for manipulating and recasting datasets that are small enough for you to download on your computer. Also, the logo is pretty cute.

The node.js library for DuckDB, however, is missing a number of features. While Node isn’t currently quite as “data-native” as R and Python, it’s obviously still a powerful ecosystem and application development runtime environment. A solid Node module for DuckDB means DuckDB is available and accessible for wider data tool development.

A wishlist of things I’m looking for / working on / have already fixed:

  • parquet by defaultthe parquet extension should be available to the node.js bindings by default (Hannes Mühleisen ~~merged a PR or this, thankfully!)
  • promisify by default – there should be a Promise-based approach to complement the traditional Node callbacks.
  • full support for all data types – I recently submitted a set of Mocha tests to provide test coverage for the data types I’ve been adding. This would be a good way to ensure that all data types are supported. Someone who’s better at C++ and the Node API will probably know how to swoop in and add generic coverage for all data types. This includes
    • INTERVAL (both for type conversion and for binding back) (PR merged),
    • strangely enough the BOOLEAN type isn’t supported (PR also merged),
    • MAP (for use of histogram(column)), and other arbitrary lists / structs
  • support for binding struct-based data types – e.g. binding an INTERVAL with a javascript object that has a months, days, and micros field. This isn’t the highest priority since one can just concatenate a string to produce an INTERVAL.
  • a way to interrupt any running queries - I would love a way to be able to send a signal to a query and have it stop running. I see in interrupt.test.js there is use of db.interrupt(). I don’t know the status of this functionality, but it doesn’t seem to actually stop my queries when I test it myself.
  • documentation – the node.js API should be better documented. The test suite should not be the definitive source of documentation. This one should be easy enough for anyone to do.