From SQL to JSON to CSV: the format conversions you'll do 1000 times
Practical recipes for the conversions every backend hits weekly — Postgres rows to JSON, JSON arrays to CSV, CSV to typed records, with edge cases.
From SQL to JSON to CSV: the format conversions you'll do 1000 times
Half of backend work is converting one shape of tabular data into another. Postgres rows go out as JSON. CSV uploads come in from finance. A vendor sends you a .xlsx you have to flatten. None of this is interesting work, but doing it badly costs you afternoons.
Here's the recipe book. SQL → JSON, JSON → CSV, CSV → JSON, and the edge cases that bite each direction.
SQL → JSON
Postgres: json_agg is the answer
For a single nested document per parent row:
SELECT
u.id,
u.email,
COALESCE(
json_agg(o.* ORDER BY o.created_at DESC) FILTER (WHERE o.id IS NOT NULL),
'[]'::json
) AS orders
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.id = $1
GROUP BY u.id;
Two non-obvious bits:
FILTER (WHERE o.id IS NOT NULL)— without it, a user with zero orders produces[null]not[]. TheLEFT JOINinjects a null row thatjson_agghappily wraps.COALESCE(..., '[]'::json)— without it, the whole aggregate isnullfor users with no orders.
For wider documents, prefer json_build_object so you control field names explicitly:
SELECT json_build_object(
'id', u.id,
'email', u.email,
'created', extract(epoch from u.created_at)::int
) FROM users u WHERE u.id = $1;
I use epoch seconds in API responses by default. Strings are debatable. Dates as ISO strings are debatable. Epoch ints are not — they sort, diff, and parse identically in every language. Format-convert at the edges, not in the API.
MySQL 8 / MariaDB 10.6+
SELECT JSON_OBJECT(
'id', u.id,
'email', u.email,
'orders', (
SELECT JSON_ARRAYAGG(JSON_OBJECT('id', o.id, 'total', o.total))
FROM orders o WHERE o.user_id = u.id
)
) FROM users u WHERE u.id = ?;
JSON_ARRAYAGG returns NULL on empty sets. Wrap with COALESCE(..., JSON_ARRAY()) if your consumer can't handle nulls.
SQLite
SELECT json_object(
'id', u.id,
'email', u.email,
'orders', (SELECT json_group_array(json_object('id', o.id, 'total', o.total))
FROM orders o WHERE o.user_id = u.id)
) FROM users u WHERE u.id = ?;
Same pattern, json_group_array is the aggregator.
If you're reading SQL someone else wrote and trying to figure out the shape, paste it into the SQL Formatter first — split lines and consistent indentation make joins legible.
JSON → CSV
This is the conversion people get wrong most often, because CSV doesn't have nulls, doesn't have nested objects, and doesn't have a real type system.
Input:
[
{"id": 1, "name": "Alice", "tags": ["admin", "ops"], "manager": null},
{"id": 2, "name": "Bob, Jr.", "tags": [], "manager": 1}
]
A correct CSV:
id,name,tags,manager
1,Alice,"admin;ops",
2,"Bob, Jr.",,1
Decisions you have to make:
- Headers — union of all keys across all rows. If row 7 has a key row 1 doesn't, your CSV still needs the column.
- Nested arrays — flatten with a separator (
;,|) you'd never see in the data, or expand into multiple columns (tags_0,tags_1). - Nested objects — flatten with dot-keys (
manager.name). Hard limits at 1-2 levels deep before consumers complain. - Nulls — empty string is the convention. Don't write
"null"unless your downstream can distinguish strings from nulls (CSV can't). - Quoting — quote anything containing your delimiter, your quote char, or a newline. Escape
"as""per RFC 4180. - Line endings — RFC 4180 says CRLF. Excel agrees. Most modern tools accept LF. If your consumer is "an Excel user", emit CRLF.
In Python with csv.DictWriter:
import csv, json, sys
rows = json.load(sys.stdin)
keys = sorted({k for r in rows for k in r.keys()})
w = csv.DictWriter(sys.stdout, fieldnames=keys, quoting=csv.QUOTE_MINIMAL,
lineterminator='\r\n')
w.writeheader()
for r in rows:
flat = {k: (';'.join(map(str, v)) if isinstance(v, list)
else json.dumps(v) if isinstance(v, dict)
else v) for k, v in r.items()}
w.writerow(flat)
QUOTE_MINIMAL quotes only when needed, which keeps the file small and diff-friendly. QUOTE_ALL is safer if you don't trust the consumer.
Or just drop your JSON array into the JSON to CSV tool, which handles the union-of-keys and nested-array flattening for you.
CSV → JSON
The opposite direction has a different set of pitfalls.
Input:
id,name,active,joined
1,Alice,true,2024-03-15
2,Bob,FALSE,2024-04-01
3,"O'Brien, Mary",true,
The naive parse gives you all strings:
[{"id":"1","name":"Alice","active":"true","joined":"2024-03-15"}, ...]
That's usually wrong. You want types. Decisions:
- Numeric coercion —
"1"→1, but"007"should stay as a string (leading zeros = identifier, not number). - Boolean coercion —
true,True,TRUE,1,yes. Pick a list, document it. - Empty fields —
nullor empty string? An empty"joined"likely means unknown, sonull. An empty"middle_name"might genuinely be empty. - Date parsing — only if you're confident in the format. Otherwise leave as string.
A pragmatic Python recipe:
import csv, json, sys
def coerce(v: str):
if v == '': return None
if v.lower() in ('true', 'false'): return v.lower() == 'true'
if v.isdigit() and not (len(v) > 1 and v[0] == '0'): return int(v)
try: return float(v) if '.' in v else v
except ValueError: return v
rows = list(csv.DictReader(sys.stdin))
print(json.dumps([{k: coerce(v) for k, v in r.items()} for r in rows], indent=2))
For an ad-hoc CSV without writing Python, the CSV to JSON tool does this with toggles for type inference.
A note on streaming
Everything above assumes you can hold the dataset in memory. Past ~100MB, switch to streaming: pg_copy → newline-delimited JSON, then jq --raw-output to CSV, line by line. JSON's lack of a streaming format is real friction; if you're building an export endpoint, emit NDJSON (one JSON object per line) instead of an array. Consumers can parse it line-by-line without loading everything.
Closing the loop with formatting
The most common workflow:
- Run a query, copy a JSON result.
- Paste into the JSON Formatter to read it.
- Convert to CSV for the finance team's spreadsheet.
- Get a corrected CSV back, convert to JSON, write a migration.
These four steps happen weekly on every backend team I've worked on. Keeping the tools one tab away saves real time.
Try it
- SQL Formatter — make hairy queries readable before you decode them
- JSON to CSV — handles union-of-keys, nested arrays, and proper quoting
- CSV to JSON — with optional type coercion
- JSON Formatter — paste, read, copy back