(TO THE GOD OF ABRAHAM, ISAAC AND JACOB. I DEDICATE THIS WORK TO YOU MAY YOU BLESS IT AND MAY IT BLESS THOSE YOU USE IT, MORESO MAY THEY KNOW YOU BY NAME, REPENT AND BE LED TO YOUR WILL AND KINGDOM.) Our Father who is in the heavens, let Your Name be set-apart,let Your reign come, let Your desire be done on earth as it is in heaven. Give us today our daily bread. And forgive us our debts, as we for- give our debtors. And do not lead us into trial, but deliver us from the wicked one because Yours is the reign and the power and the esteem, forever. Amen.
From SQL to JSON to CSV: the format conversions you'll do 1000 times | devformat.tools Blog
sqljsoncsv

From SQL to JSON to CSV: the format conversions you'll do 1000 times

Practical recipes for the conversions every backend hits weekly — Postgres rows to JSON, JSON arrays to CSV, CSV to typed records, with edge cases.

By devformat.tools · · 6 min read

From SQL to JSON to CSV: the format conversions you'll do 1000 times

Half of backend work is converting one shape of tabular data into another. Postgres rows go out as JSON. CSV uploads come in from finance. A vendor sends you a .xlsx you have to flatten. None of this is interesting work, but doing it badly costs you afternoons.

Here's the recipe book. SQL → JSON, JSON → CSV, CSV → JSON, and the edge cases that bite each direction.

SQL → JSON

Postgres: json_agg is the answer

For a single nested document per parent row:

SELECT
  u.id,
  u.email,
  COALESCE(
    json_agg(o.* ORDER BY o.created_at DESC) FILTER (WHERE o.id IS NOT NULL),
    '[]'::json
  ) AS orders
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.id = $1
GROUP BY u.id;

Two non-obvious bits:

  • FILTER (WHERE o.id IS NOT NULL) — without it, a user with zero orders produces [null] not []. The LEFT JOIN injects a null row that json_agg happily wraps.
  • COALESCE(..., '[]'::json) — without it, the whole aggregate is null for users with no orders.

For wider documents, prefer json_build_object so you control field names explicitly:

SELECT json_build_object(
  'id', u.id,
  'email', u.email,
  'created', extract(epoch from u.created_at)::int
) FROM users u WHERE u.id = $1;

I use epoch seconds in API responses by default. Strings are debatable. Dates as ISO strings are debatable. Epoch ints are not — they sort, diff, and parse identically in every language. Format-convert at the edges, not in the API.

MySQL 8 / MariaDB 10.6+

SELECT JSON_OBJECT(
  'id', u.id,
  'email', u.email,
  'orders', (
    SELECT JSON_ARRAYAGG(JSON_OBJECT('id', o.id, 'total', o.total))
    FROM orders o WHERE o.user_id = u.id
  )
) FROM users u WHERE u.id = ?;

JSON_ARRAYAGG returns NULL on empty sets. Wrap with COALESCE(..., JSON_ARRAY()) if your consumer can't handle nulls.

SQLite

SELECT json_object(
  'id', u.id,
  'email', u.email,
  'orders', (SELECT json_group_array(json_object('id', o.id, 'total', o.total))
             FROM orders o WHERE o.user_id = u.id)
) FROM users u WHERE u.id = ?;

Same pattern, json_group_array is the aggregator.

If you're reading SQL someone else wrote and trying to figure out the shape, paste it into the SQL Formatter first — split lines and consistent indentation make joins legible.

JSON → CSV

This is the conversion people get wrong most often, because CSV doesn't have nulls, doesn't have nested objects, and doesn't have a real type system.

Input:

[
  {"id": 1, "name": "Alice", "tags": ["admin", "ops"], "manager": null},
  {"id": 2, "name": "Bob, Jr.", "tags": [], "manager": 1}
]

A correct CSV:

id,name,tags,manager
1,Alice,"admin;ops",
2,"Bob, Jr.",,1

Decisions you have to make:

  1. Headers — union of all keys across all rows. If row 7 has a key row 1 doesn't, your CSV still needs the column.
  2. Nested arrays — flatten with a separator (;, |) you'd never see in the data, or expand into multiple columns (tags_0, tags_1).
  3. Nested objects — flatten with dot-keys (manager.name). Hard limits at 1-2 levels deep before consumers complain.
  4. Nulls — empty string is the convention. Don't write "null" unless your downstream can distinguish strings from nulls (CSV can't).
  5. Quoting — quote anything containing your delimiter, your quote char, or a newline. Escape " as "" per RFC 4180.
  6. Line endings — RFC 4180 says CRLF. Excel agrees. Most modern tools accept LF. If your consumer is "an Excel user", emit CRLF.

In Python with csv.DictWriter:

import csv, json, sys

rows = json.load(sys.stdin)
keys = sorted({k for r in rows for k in r.keys()})

w = csv.DictWriter(sys.stdout, fieldnames=keys, quoting=csv.QUOTE_MINIMAL,
                   lineterminator='\r\n')
w.writeheader()
for r in rows:
    flat = {k: (';'.join(map(str, v)) if isinstance(v, list)
                else json.dumps(v) if isinstance(v, dict)
                else v) for k, v in r.items()}
    w.writerow(flat)

QUOTE_MINIMAL quotes only when needed, which keeps the file small and diff-friendly. QUOTE_ALL is safer if you don't trust the consumer.

Or just drop your JSON array into the JSON to CSV tool, which handles the union-of-keys and nested-array flattening for you.

CSV → JSON

The opposite direction has a different set of pitfalls.

Input:

id,name,active,joined
1,Alice,true,2024-03-15
2,Bob,FALSE,2024-04-01
3,"O'Brien, Mary",true,

The naive parse gives you all strings:

[{"id":"1","name":"Alice","active":"true","joined":"2024-03-15"}, ...]

That's usually wrong. You want types. Decisions:

  1. Numeric coercion"1"1, but "007" should stay as a string (leading zeros = identifier, not number).
  2. Boolean coerciontrue, True, TRUE, 1, yes. Pick a list, document it.
  3. Empty fieldsnull or empty string? An empty "joined" likely means unknown, so null. An empty "middle_name" might genuinely be empty.
  4. Date parsing — only if you're confident in the format. Otherwise leave as string.

A pragmatic Python recipe:

import csv, json, sys

def coerce(v: str):
    if v == '': return None
    if v.lower() in ('true', 'false'): return v.lower() == 'true'
    if v.isdigit() and not (len(v) > 1 and v[0] == '0'): return int(v)
    try: return float(v) if '.' in v else v
    except ValueError: return v

rows = list(csv.DictReader(sys.stdin))
print(json.dumps([{k: coerce(v) for k, v in r.items()} for r in rows], indent=2))

For an ad-hoc CSV without writing Python, the CSV to JSON tool does this with toggles for type inference.

A note on streaming

Everything above assumes you can hold the dataset in memory. Past ~100MB, switch to streaming: pg_copy → newline-delimited JSON, then jq --raw-output to CSV, line by line. JSON's lack of a streaming format is real friction; if you're building an export endpoint, emit NDJSON (one JSON object per line) instead of an array. Consumers can parse it line-by-line without loading everything.

Closing the loop with formatting

The most common workflow:

  1. Run a query, copy a JSON result.
  2. Paste into the JSON Formatter to read it.
  3. Convert to CSV for the finance team's spreadsheet.
  4. Get a corrected CSV back, convert to JSON, write a migration.

These four steps happen weekly on every backend team I've worked on. Keeping the tools one tab away saves real time.

Try it

Try our free developer tools

51+ tools that run in your browser. No data sent anywhere.

Browse Tools