In the DuckDB CLI, when I set .mode json
, the following query
WITH x as (
select 1 as a
UNION ALL select 2 as a
)
SELECT * from x JOIN x AS xi on x.a = xi.a
Produces a weird result set:
[{"a":1,"a":1},
{"a":2,"a":2}]
If you were to parse this JSON string, the first “a” and “b” will get removed by the later ones.
This is exactly what you see in Node and also in Rill Developer’s main
branch right now. It’s
weird behavior, since … you’re throwing out data.
I noticed this because Rill Developer used to return a result set that looked like this:
[{"a":1,"a:1":1},
{"a":2,"a:1":2}]
Which made application development much easier. Through Rill Developer we utilize these :<number>
aliases
to prevent duplicate keys in lists. Your result set not only has all the row values you’re looking for; it also makes it much
easier to select duplicate values in profile queries.
It turns out that if you wrap that final SELECT
statement as a CTE and select it - in other words, this:
WITH x as (
select 1 as a
UNION ALL select 2 as a
),
y as (
SELECT * from x JOIN x AS xi on x.a = xi.a
)
SELECT * from y
You’ll get these nice :<number>
aliases.
This behavior feels bizarre. Why not just always return this? Well, it turns out that sqlite has the same behavior. The choice to mirror this in duckdb was intentional. So while it may seem a little strange at first glance, there’s probably a deeper reason.
It turns out that this is how sqlite also handles these.
If I were to materialize the first query as a table or view, or use a CTE, then I get the :<number>
part.
It’s hard to fault DuckDB for taking the same approach as the much-more-mature SQLite. And other SQL engines have all sorts of behavior:
- BigQuery will return
a
anda_1
for both queries; - Postgres will return
a
anda
for both queries; - SQL Server will return
a
anda
for the first query, and error for the second - Snowflake returns
a
anda
for both queries