diff --git a/doc/parquet-howto.md b/doc/parquet-howto.md
new file mode 100644
index 00000000..519739d8
--- /dev/null
+++ b/doc/parquet-howto.md
@@ -0,0 +1,29 @@
+Parquet is used as an intermediate storage format for a lot of processed data.
+
+See [third-party/parquet-floor](../third-party/parquet-floor).
+
+## How to query the data?
+
+[DuckDB](https://duckdb.org/) is probably the best tool for interacting with these files.  You can
+query them with SQL, like
+
+```sql
+SELECT foo,bar FROM 'baz.parquet' ...
+```
+
+## How to inspect word metadata from `documentNNNN.parquet` ?
+
+The document keywords records contain repeated values. For debugging these
+repeated values, they can be unnested in e.g. DuckDB with a query like
+
+```sql
+SELECT word, hex(wordMeta) from 
+    (
+        SELECT 
+            UNNEST(word) AS word, 
+            UNNEST(wordMeta) AS wordMeta 
+        FROM 'document0000.parquet'
+        WHERE url='...'
+    )
+WHERE word IN ('foo', 'bar')
+```
\ No newline at end of file
diff --git a/doc/readme.md b/doc/readme.md
index a5da9973..bbc64105 100644
--- a/doc/readme.md
+++ b/doc/readme.md
@@ -6,7 +6,10 @@ Start in [📁 ../code/](../code/) and poke around.
 ## Operations
 
 * [System Properties](system-properties.md) - JVM property flags
+
+## How-To 
 * [Sideloading How-To](sideloading-howto.md) - How to sideload various data sets
+* [Parquet How-To](parquet-howto.md) - Useful tips in working with Parquet files
 
 ## Set-up