{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In my domain, the energy world, we work a many with timeseries. Generally, these arrive in batches where one day (in localtime) always arrives at once and there can be multiple versions of the same data. A normal portfolio has many thousands of assets. Often one wants to aggregate these timeseries over a subset of assets, for some time range. This is not a trivial thing to do in a row-oriented database like Postgres. Below we will explore some methods to store this data, and benchmark different queries to get the same result.\n", "\n", "\n", "\n", "See [this post](/posts/postgres-notify/) for more tips on how to work with psql and psycopg2 from a notebook." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import os, random, json, pytz\n", "from datetime import datetime, timedelta, time\n", "from time import time as t\n", "from functools import lru_cache\n", "from itertools import zip_longest\n", "\n", "from IPython.display import Javascript\n", "import plotly.express as px\n", "import tqdm" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting package metadata (current_repodata.json): ...working... done\n", "Solving environment: ...working... done\n", "\n", "# All requested packages already installed.\n", "\n", "psql (PostgreSQL) 12.4 (Ubuntu 12.4-1.pgdg20.04+1)\n" ] } ], "source": [ "# Install python dependencies\n", "!conda install -y --quiet psycopg2\n", "# Install postgres (if not already installed)\n", "!sudo apt-get install postgresql -yqq\n", "# It needs to run in the background\n", "!sudo service postgresql start\n", "!psql --version" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CREATE ROLE\n", "CREATE DATABASE\n", "ALTER ROLE\n" ] } ], "source": [ "# Optionally drop the user and/or database if starting anew\n", "# !sudo su - postgres -c \"psql -c \\\"DROP DATABASE timeseries_benchmark\\\"\"\n", "# !sudo su - postgres -c \"psql -c \\\"DROP USER tse\\\"\"\n", "# Create user, db, and give access\n", "!sudo su - postgres -c \"psql -c \\\"CREATE USER tse WITH PASSWORD 'abc'\\\" && psql -c \\\"CREATE DATABASE timeseries_benchmark OWNER tse\\\"\"\n", "# Needed to be able to COPY FROM a file\n", "!sudo su - postgres -c \"psql -c \\\"ALTER USER tse WITH SUPERUSER\\\"\"" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "import psycopg2\n", "conn = psycopg2.connect(user=\"tse\", password=\"abc\", database=\"timeseries_benchmark\", host=\"localhost\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create tables\n", "\n", "First we make the three tables to be benchmarked, they differ:\n", "\n", "- actuals_array_double: One array per asset/day/version, with DOUBLE PRECISION (8 bytes) datatype\n", "- actuals_array: Same but with REAL (4 bytes)\n", "- actuals_cols: Also with REAL but split into 100 physical columns instead of ARRAYs" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Drop tables if running anew\n", "# !sudo su - postgres -c \"psql -d timeseries_benchmark -c \\\"DROP TABLE IF EXISTS actuals_array_double; DROP TABLE IF EXISTS actuals_array; DROP TABLE IF EXISTS actuals_cols;\\\"\"\n", "\n", "# FLOAT means DOUBLE PRECISION (8B)\n", "with conn as conn:\n", " with conn.cursor() as curs:\n", " curs.execute(\"\"\"\n", " CREATE TABLE actuals_array_double (\n", " id serial PRIMARY KEY,\n", " asset_id INTEGER NOT NULL,\n", " day DATE NOT NULL,\n", " created TIMESTAMP NOT NULL,\n", " values FLOAT[] NOT NULL,\n", " CONSTRAINT actuals_array_double_asset_day UNIQUE (asset_id, day, created)\n", " );\n", " \"\"\")\n", "\n", "# REAL is single precision floating point\n", "with conn as conn:\n", " with conn.cursor() as curs:\n", " curs.execute(\"\"\"\n", " CREATE TABLE actuals_array (\n", " id serial PRIMARY KEY,\n", " asset_id INTEGER NOT NULL,\n", " day DATE NOT NULL,\n", " created TIMESTAMP NOT NULL,\n", " values REAL[] NOT NULL,\n", " CONSTRAINT actuals_array_asset_day UNIQUE (asset_id, day, created)\n", " );\n", " \"\"\")\n", "\n", "with conn as conn:\n", " with conn.cursor() as curs:\n", " pte_str = \",\\n \".join(f\"pte{i:02d} REAL\" for i in range(100))\n", " curs.execute(f\"\"\"\n", " CREATE TABLE actuals_cols (\n", " id serial PRIMARY KEY,\n", " asset_id INTEGER NOT NULL,\n", " day DATE NOT NULL,\n", " created TIMESTAMP NOT NULL,\n", " {pte_str},\n", " CONSTRAINT actuals_cols_asset_day UNIQUE (asset_id, day, created)\n", " );\n", " \"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating and importing testdata\n", "\n", "I'm creating a reasonable amount of data, about 16M rows, as that is still okay to load and query on my laptop. To reach that amount of data I use 5000 assets, 3 years, and 3 versions of daily actuals." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Use a cache to speed up the function call, the randomness is not critical\n", "@lru_cache(maxsize=10000)\n", "def randvals(d: datetime, tz = pytz.timezone(\"Europe/Amsterdam\"), values_arr=True):\n", " d1 = tz.localize(datetime.combine(d, time()))\n", " d2 = tz.localize(datetime.combine(d + timedelta(days=1), time()))\n", " periods = int((d2 - d1).total_seconds() / 3600 * 4)\n", " values = [random.random() for i in range(periods)]\n", " if values_arr:\n", " return '\"{' + \",\".join(f\"{v:.3f}\" for v in values) + '}\"'\n", " else:\n", " # Always make 100 columns, but fill with unquoted empty string\n", " # See parameters -> NULL here: https://www.postgresql.org/docs/9.2/sql-copy.html\n", " return \",\".join(f\"{v:.3f}\" if v else \"\" for i, v in zip_longest(range(100), values))\n", "\n", "def generate_fake_data(filename, n_assets=5E3, n_years=3, n_versions=3, stop_dt=datetime.now(), values_arr=True):\n", " with open(filename, \"w\") as f:\n", " i = 0\n", " for days_ago in tqdm.tqdm(range(n_years * 365)):\n", " for asset_id in range(int(n_assets)):\n", " day = (stop_dt - timedelta(days=days_ago))\n", " for i_v in range(n_versions):\n", " i += 1\n", " created = day - i_v * timedelta(days=3)\n", " values = randvals(day, values_arr=values_arr)\n", " line = f\"{i:d}, {asset_id:d}, '{day:%Y-%m-%d}', '{created.isoformat()}', {values}\"\n", " f.write(line + \"\\n\")\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 1095/1095 [01:50<00:00, 9.88it/s]\n" ] } ], "source": [ "generate_fake_data(\"actuals_array.csv\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 1095/1095 [01:47<00:00, 10.19it/s]\n" ] } ], "source": [ "generate_fake_data(\"actuals_cols.csv\", values_arr=False)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1, 0, '2020-09-04', '2020-09-04T20:16:44.631097', \"{0.172,0.477,0.225,0.707,0.876,0.952,0.913,0.252,0.598,0.292,0.326,0.854,0.821,0.203,0.706,0.531,0.560,0.150,0.843,0.727,0.669,0.039,0.144,0.843,0.019,0.708,0.179,0.308,0.967,0.307,0.031,0.120,0.095,0.824,0.623,0.377,0.916,0.699,0.586,0.953,0.704,0.372,0.156,0.620,0.408,0.229,0.400,0.496,0.502,0.983,0.394,0.683,0.974,0.253,0.398,0.241,0.317,0.899,0.322,0.312,0.659,0.567,0.404,0.210,0.374,0.307,0.126,0.483,0.074,0.536,0.022,0.230,0.881,0.246,0.742,0.247,0.807,0.694,0.570,0.447,0.329,0.570,0.221,0.954,0.693,0.355,0.506,0.111,0.826,0.973,0.822,0.707,0.918,0.734,0.171,0.597}\"\n", "1, 0, '2020-09-04', '2020-09-04T20:16:44.631097', 0.401,0.674,0.385,0.774,0.085,0.304,0.894,0.695,0.591,0.050,0.157,0.392,0.334,0.009,0.095,0.452,0.083,0.289,0.336,0.174,0.140,0.918,0.005,0.389,0.815,0.001,0.323,0.000,0.759,0.697,0.318,0.948,0.985,0.327,0.414,0.605,0.260,0.701,0.156,0.568,0.503,0.455,0.042,0.582,0.310,0.860,0.013,0.190,0.313,0.168,0.506,0.163,0.792,0.828,0.161,0.160,0.356,0.920,0.774,0.493,0.224,0.194,0.326,0.107,0.645,0.954,0.382,0.151,0.593,0.914,0.739,0.332,0.469,0.053,0.472,0.831,0.062,0.538,0.767,0.467,0.909,0.363,0.865,0.744,0.167,0.171,0.442,0.305,0.563,0.144,0.228,0.157,0.473,0.210,0.846,0.958,,,,\n", "16425000 actuals_array.csv\n" ] } ], "source": [ "!head -1 actuals_array.csv\n", "!head -1 actuals_cols.csv\n", "!wc -l actuals_array.csv" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def copy_from(conn, filename, table):\n", " path = os.path.abspath(filename)\n", " print(f\"Importing '{filename}' to '{table}'\")\n", " with conn as conn:\n", " with conn.cursor() as curs:\n", " curs.execute(f\"DELETE FROM {table};\")\n", " curs.execute(f\"COPY {table} FROM '{path}' (FORMAT CSV);\")\n", " curs.execute(f\"SELECT COUNT(*) FROM {table};\")\n", " n_rows = curs.fetchone()[0]\n", " curs.execute(f\"SELECT pg_size_pretty(pg_relation_size('{table}'));\")\n", " print(f\"Table '{table}' rows and size:\", n_rows, \",\", curs.fetchone()[0])" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Importing 'actuals_array.csv' to 'actuals_array_double'\n", "Table 'actuals_array_double' rows and size: 16425000 , 14 GB\n", "CPU times: user 17.1 ms, sys: 851 µs, total: 18 ms\n", "Wall time: 7min 43s\n", "Importing 'actuals_array.csv' to 'actuals_array'\n", "Table 'actuals_array' rows and size: 16425000 , 7547 MB\n", "CPU times: user 13 ms, sys: 1 ms, total: 14 ms\n", "Wall time: 6min 53s\n", "Importing 'actuals_cols.csv' to 'actuals_cols'\n", "Table 'actuals_cols' rows and size: 16425000 , 7129 MB\n", "CPU times: user 11.4 ms, sys: 0 ns, total: 11.4 ms\n", "Wall time: 5min 40s\n" ] } ], "source": [ "%time copy_from(conn, \"actuals_array.csv\", \"actuals_array_double\")\n", "%time copy_from(conn, \"actuals_array.csv\", \"actuals_array\")\n", "%time copy_from(conn, \"actuals_cols.csv\", \"actuals_cols\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuning Postgres\n", "\n", "I [tuned postgres](https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server) a bit to increase performance, as my machine is much larger than the default settings assume. After checking out TimescaleDB, I used their tune script which seemed to set even better values.\n", "\n", "Find the config file with `SHOW config_file;`, edit it, and then restart with `systemctl restart postgresql`.\n", "\n", "`timescaledb-tune` gave me these settings based on my system (only listed the values that were changed from their defaults):\n", "```\n", "effective_cache_size = 23976MB\n", "maintenance_work_mem = 2047MB\n", "work_mem = 10229kB\n", "\n", "max_worker_processes = 19\n", "max_parallel_workers_per_gather = 4\n", "max_parallel_workers = 8\n", "\n", "wal_buffers = 16MB\n", "min_wal_size = 512MB\n", "\n", "default_statistics_target = 500\n", "random_page_cost = 1.1\n", "checkpoint_completion_target = 0.9\n", "max_locks_per_transaction = 256\n", "autovacuum_max_workers = 10\n", "autovacuum_naptime = 10\n", "effective_io_concurrency = 200\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use pev2 to visualize Postgres query plans\n", "\n", "I came across a very nice visualization tool called PEV2. It's a Vue.js module, so naturally I wanted to go the extra mile and make it possible to render this in a jupyter lab notebook. Some hours later I finally got it to work, the steps were:\n", "- Cloned pev2 and followed the [pev2 getting started guide](https://github.com/dalibo/pev2/blob/master/CONTRIBUTING.md)\n", "- Install vue-cli-service script using `npm i @vue/cli-service`\n", "- Compile only the Plan top-level module as UMD (universal, so will register as `window.pev2` when loaded with no specific loader) `./node_modules/.bin/vue-cli-service build --target lib src/components/Plan.vue`\n", "- Insert all required css/javascript in head/body, respectively\n", "- Create function that instantiates the pev2 Vue Component using a new Vue instance\n", "- (When rendering notebook for blog put javascript in EXTRA_HEAD_DATA in conf.py, I could not get it to reliably load when output in a cell like below)\n", "\n", "This works surprisingly well! One could definitely create a nice jupyter lab extension this way.\n", "\n", "I should also mention that I tried [pgMustard](https://www.pgmustard.com/getting-a-query-plan), which gave me some pretty good tips on index use. Unfortunately it's not embeddable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%javascript\n", " function addScript(url) {\n", " var script = document.createElement(\"script\");\n", " script.type = \"text/javascript\";\n", " script.src = url;\n", " //document.head.appendChild(script);\n", " // For use on website, as anything added to is not available on jupyter reload\n", " element.append(script);\n", " }\n", " function addCSS(url) {\n", " var link = document.createElement(\"link\");\n", "\n", " link.type = \"text/css\";\n", " link.rel = \"stylesheet\";\n", " link.href = url;\n", "\n", " //document.head.appendChild(link);\n", " element.append(link);\n", " }\n", " addCSS(\"https://unpkg.com/bootstrap@4.5.0/dist/css/bootstrap.min.css\");\n", " addCSS(\"https://unpkg.com/@fortawesome/fontawesome-free@5.13.0/css/all.css\");\n", " addScript(\"https://cdn.jsdelivr.net/npm/vue@2.6.12/dist/vue.min.js\");\n", " // Only works if you're running jupyter lab with this notebook as root repo and the compiled file in this folder with name pev2.umd.js\n", " addScript(\"/files/pev2.umd.js\");" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "def pev2(query, plan):\n", " \"\"\"\n", " Visualize a Postgres query plan with pev2. Try to pass a text query plan, not JSON,\n", " as text has less issues with escaping double quotes.\n", " \"\"\"\n", " s = \"\"\"\n", " var el = document.createElement('div');\n", " element.append(el);\n", "\n", " var v = new Vue({\n", " el: el,\n", " components: {\n", " 'pev2': window.pev2\n", " },\n", " data: function() {\n", " return {\n", " plan: `\"\"\" + plan + \"\"\"`,\n", " query: `\"\"\" + query + \"\"\"`,\n", " }\n", " },\n", " template: \"\"\n", " });\n", " \"\"\"\n", " return Javascript(s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Query 1a: UNNESTing the array without index\n", "\n", "We start with this query that uses `UNNEST()` on the array of values, to get separate rows for each value, then do some `GROUP BY` tricks to get the final outcome. The result is an aggregated timeseries with a summed value for every timestep.\n", "\n", "This is just to show how expensive a sort step can be, soon we will add some indexes." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "\n", " var el = document.createElement('div');\n", " element.append(el);\n", "\n", " var v = new Vue({\n", " el: el,\n", " components: {\n", " 'pev2': window.pev2\n", " },\n", " data: function() {\n", " return {\n", " plan: `GroupAggregate (cost=7414008.90..7414031.40 rows=200 width=44) (actual time=305160.392..305198.537 rows=1095 loops=1)\n", " Output: a.day, max(a.count_asset), array_agg(a.s ORDER BY a.timeblock)\n", " Group Key: a.day\n", " Buffers: shared hit=78416 read=887614, temp read=2778975 written=2779870\n", " -> Sort (cost=7414008.90..7414013.90 rows=2000 width=24) (actual time=305160.333..305168.921 rows=105120 loops=1)\n", " Output: a.day, a.count_asset, a.s, a.timeblock\n", " Sort Key: a.day\n", " Sort Method: external merge Disk: 4328kB\n", " Buffers: shared hit=78416 read=887614, temp read=2778975 written=2779870\n", " -> Subquery Scan on a (cost=7413859.25..7413899.25 rows=2000 width=24) (actual time=305103.509..305130.069 rows=105120 loops=1)\n", " Output: a.day, a.count_asset, a.s, a.timeblock\n", " Buffers: shared hit=78416 read=887614, temp read=2778434 written=2779328\n", " -> HashAggregate (cost=7413859.25..7413879.25 rows=2000 width=24) (actual time=305103.507..305123.119 rows=105120 loops=1)\n", " Output: actuals_array.day, x.timeblock, sum(x.elem), count(*)\n", " Group Key: actuals_array.day, x.timeblock\n", " Buffers: shared hit=78416 read=887614, temp read=2778434 written=2779328\n", " -> Nested Loop (cost=6781502.94..7249610.85 rows=16424840 width=16) (actual time=90043.282..201299.123 rows=525600000 loops=1)\n", " Output: actuals_array.day, x.timeblock, x.elem\n", " Buffers: shared hit=78416 read=887614, temp read=2778434 written=2779328\n", " -> Unique (cost=6781502.94..6904689.20 rows=1642484 width=424) (actual time=90043.246..97252.104 rows=5475000 loops=1)\n", " Output: actuals_array.day, actuals_array.\"values\", actuals_array.asset_id, actuals_array.created\n", " Buffers: shared hit=78416 read=887614, temp read=2778434 written=2779328\n", " -> Sort (cost=6781502.94..6822565.03 rows=16424835 width=424) (actual time=90043.244..95545.721 rows=16425000 loops=1)\n", " Output: actuals_array.day, actuals_array.\"values\", actuals_array.asset_id, actuals_array.created\n", " Sort Key: actuals_array.asset_id, actuals_array.day, actuals_array.created DESC NULLS LAST\n", " Sort Method: external merge Disk: 6975208kB\n", " Buffers: shared hit=78416 read=887614, temp read=2778434 written=2779328\n", " -> Seq Scan on public.actuals_array (cost=0.00..1130278.35 rows=16424835 width=424) (actual time=180.071..24537.045 rows=16425000 loops=1)\n", " Output: actuals_array.day, actuals_array.\"values\", actuals_array.asset_id, actuals_array.created\n", " Buffers: shared hit=78416 read=887614\n", " -> Function Scan on pg_catalog.unnest x (cost=0.00..0.10 rows=10 width=12) (actual time=0.005..0.012 rows=96 loops=5475000)\n", " Output: x.elem, x.timeblock\n", " Function Call: unnest(actuals_array.\"values\")\n", "Planning Time: 0.193 ms\n", "JIT:\n", " Functions: 18\n", " Options: Inlining true, Optimization true, Expressions true, Deforming true\n", " Timing: Generation 2.377 ms, Inlining 11.821 ms, Optimization 104.381 ms, Emission 63.465 ms, Total 182.044 ms\n", "Execution Time: 305860.170 ms`,\n", " query: `\n", " EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)\n", "\n", " -- Put aggregated load result in array per day. max/min for unique assets\n", " -- does not matter, we just need to pick one as the same value\n", " -- has been distributed over the rows\n", " SELECT\n", " day,\n", " MAX(count_asset) as count_asset,\n", " array_agg(s ORDER BY timeblock) as total_value\n", " FROM (\n", " SELECT\n", " day,\n", " timeblock,\n", " SUM(elem) as s,\n", " COUNT(*) as count_asset\n", "\n", " FROM (\n", " -- Select only latest rows\n", " SELECT DISTINCT ON (asset_id, day)\n", " day,\n", " values\n", " FROM actuals_array\n", " -- Sort descending by reference_datetime so it picks latest distinct value,\n", " -- choosing a value over NULL, and the highest ID if there are ties\n", " ORDER BY asset_id, day, created DESC NULLS LAST\n", " ) AS t, LATERAL UNNEST(t.values) WITH ORDINALITY x(elem, timeblock)\n", "\n", " GROUP BY day, timeblock\n", " ) as a\n", " GROUP BY day;\n", " `,\n", " }\n", " },\n", " template: \"\"\n", " });\n", " " ], "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def query_array_unnest(table=\"actuals_array\", explain=True):\n", " with conn.cursor() as curs:\n", " query = f\"\"\"\n", " {'EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)' if explain else ''}\n", "\n", " -- Put aggregated load result in array per day. max/min for unique assets\n", " -- does not matter, we just need to pick one as the same value\n", " -- has been distributed over the rows\n", " SELECT\n", " day,\n", " MAX(count_asset) as count_asset,\n", " array_agg(s ORDER BY timeblock) as total_value\n", " FROM (\n", " SELECT\n", " day,\n", " timeblock,\n", " SUM(elem) as s,\n", " COUNT(*) as count_asset\n", "\n", " FROM (\n", " -- Select only latest rows\n", " SELECT DISTINCT ON (asset_id, day)\n", " day,\n", " values\n", " FROM {table}\n", " -- Sort descending by reference_datetime so it picks latest distinct value,\n", " -- choosing a value over NULL, and the highest ID if there are ties\n", " ORDER BY asset_id, day, created DESC NULLS LAST\n", " ) AS t, LATERAL UNNEST(t.values) WITH ORDINALITY x(elem, timeblock)\n", "\n", " GROUP BY day, timeblock\n", " ) as a\n", " GROUP BY day;\n", " \"\"\"\n", " curs.execute(query)\n", " if explain:\n", " plan = \"\\n\".join([l[0] for l in curs.fetchall()])\n", " else:\n", " plan = curs.fetchall()\n", " \n", " return query, plan\n", "\n", "pev2(*query_array_unnest())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating indexes\n", "\n", "The queries all sort by `asset_id, day, created DESC NULLS LAST`. To speed this up, we create several indexes so that Postgres can do an INDEX SCAN instead of a SORT, which is much faster.\n", "\n", "We also ANALYZE the tables to give the query planner some up-to-date statistics." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "with conn.cursor() as curs:\n", " def create_idx(table):\n", " curs.execute(f\"CREATE INDEX IF NOT EXISTS {table}_asset_id_day_created \"\n", " f\"ON {table} (asset_id, day, created DESC NULLS LAST); \"\n", " f\"ANALYZE {table};\")\n", " create_idx(\"actuals_array_double\")\n", " create_idx(\"actuals_array\")\n", " create_idx(\"actuals_cols\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Query 1: UNNESTing with index\n", "\n", "Now that we have relevant indexes, we can eliminate the very expensive SORT step." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "\n", " var el = document.createElement('div');\n", " element.append(el);\n", "\n", " var v = new Vue({\n", " el: el,\n", " components: {\n", " 'pev2': window.pev2\n", " },\n", " data: function() {\n", " return {\n", " plan: `GroupAggregate (cost=1969926.86..1969949.36 rows=200 width=44) (actual time=239271.724..239305.366 rows=1095 loops=1)\n", " Output: a.day, max(a.count_asset), array_agg(a.s ORDER BY a.timeblock)\n", " Group Key: a.day\n", " Buffers: shared hit=5152218 read=1029736, temp read=541 written=542\n", " -> Sort (cost=1969926.86..1969931.86 rows=2000 width=24) (actual time=239271.668..239279.243 rows=105120 loops=1)\n", " Output: a.day, a.count_asset, a.s, a.timeblock\n", " Sort Key: a.day\n", " Sort Method: external merge Disk: 4328kB\n", " Buffers: shared hit=5152218 read=1029736, temp read=541 written=542\n", " -> Subquery Scan on a (cost=1969777.20..1969817.20 rows=2000 width=24) (actual time=239214.789..239242.084 rows=105120 loops=1)\n", " Output: a.day, a.count_asset, a.s, a.timeblock\n", " Buffers: shared hit=5152218 read=1029736\n", " -> HashAggregate (cost=1969777.20..1969797.20 rows=2000 width=24) (actual time=239214.788..239235.065 rows=105120 loops=1)\n", " Output: actuals_array.day, x.timeblock, sum(x.elem), count(*)\n", " Group Key: actuals_array.day, x.timeblock\n", " Buffers: shared hit=5152218 read=1029736\n", " -> Nested Loop (cost=0.44..1805527.60 rows=16424960 width=16) (actual time=263.436..129616.858 rows=525600000 loops=1)\n", " Output: actuals_array.day, x.timeblock, x.elem\n", " Buffers: shared hit=5152218 read=1029736\n", " -> Result (cost=0.43..1460603.44 rows=1642496 width=423) (actual time=263.415..20144.292 rows=5475000 loops=1)\n", " Output: actuals_array.day, actuals_array.\"values\", actuals_array.asset_id, actuals_array.created\n", " Buffers: shared hit=5152218 read=1029736\n", " -> Unique (cost=0.43..1460603.44 rows=1642496 width=423) (actual time=0.079..19245.330 rows=5475000 loops=1)\n", " Output: actuals_array.id, actuals_array.asset_id, actuals_array.day, actuals_array.created, actuals_array.\"values\"\n", " Buffers: shared hit=5152218 read=1029736\n", " -> Index Scan using actuals_array_asset_id_day_created on public.actuals_array (cost=0.43..1378478.65 rows=16424957 width=423) (actual time=0.076..17077.097 rows=16425000 loops=1)\n", " Output: actuals_array.id, actuals_array.asset_id, actuals_array.day, actuals_array.created, actuals_array.\"values\"\n", " Buffers: shared hit=5152218 read=1029736\n", " -> Function Scan on pg_catalog.unnest x (cost=0.00..0.10 rows=10 width=12) (actual time=0.006..0.012 rows=96 loops=5475000)\n", " Output: x.elem, x.timeblock\n", " Function Call: unnest(actuals_array.\"values\")\n", "Planning Time: 3.335 ms\n", "JIT:\n", " Functions: 17\n", " Options: Inlining true, Optimization true, Expressions true, Deforming true\n", " Timing: Generation 16.750 ms, Inlining 50.233 ms, Optimization 117.896 ms, Emission 93.486 ms, Total 278.365 ms\n", "Execution Time: 239323.688 ms`,\n", " query: `\n", " EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)\n", "\n", " -- Put aggregated load result in array per day. max/min for unique assets\n", " -- does not matter, we just need to pick one as the same value\n", " -- has been distributed over the rows\n", " SELECT\n", " day,\n", " MAX(count_asset) as count_asset,\n", " array_agg(s ORDER BY timeblock) as total_value\n", " FROM (\n", " SELECT\n", " day,\n", " timeblock,\n", " SUM(elem) as s,\n", " COUNT(*) as count_asset\n", "\n", " FROM (\n", " -- Select only latest rows\n", " SELECT DISTINCT ON (asset_id, day)\n", " day,\n", " values\n", " FROM actuals_array\n", " -- Sort descending by reference_datetime so it picks latest distinct value,\n", " -- choosing a value over NULL, and the highest ID if there are ties\n", " ORDER BY asset_id, day, created DESC NULLS LAST\n", " ) AS t, LATERAL UNNEST(t.values) WITH ORDINALITY x(elem, timeblock)\n", "\n", " GROUP BY day, timeblock\n", " ) as a\n", " GROUP BY day;\n", " `,\n", " }\n", " },\n", " template: \"\"\n", " });\n", " " ], "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pev2(*query_array_unnest())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Query 2: Array subscripting\n", "\n", "Our first idea did work, but the query time is very high, even when we created a relevant index.\n", "\n", "My hunch is that `UNNEST()` creates so many rows (about two orders of magnitude more) that the aggregate step is much slower. Let's try to subscript the array to go directly to many columns, instead of many rows." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "\n", " var el = document.createElement('div');\n", " element.append(el);\n", "\n", " var v = new Vue({\n", " el: el,\n", " components: {\n", " 'pev2': window.pev2\n", " },\n", " data: function() {\n", " return {\n", " plan: `HashAggregate (cost=1895864.88..1895866.88 rows=200 width=412) (actual time=56298.607..56299.713 rows=1095 loops=1)\n", " Output: actuals_array.day, count(*), sum((actuals_array.\"values\"[1])), sum((actuals_array.\"values\"[2])), sum((actuals_array.\"values\"[3])), sum((actuals_array.\"values\"[4])), sum((actuals_array.\"values\"[5])), sum((actuals_array.\"values\"[6])), sum((actuals_array.\"values\"[7])), sum((actuals_array.\"values\"[8])), sum((actuals_array.\"values\"[9])), sum((actuals_array.\"values\"[10])), sum((actuals_array.\"values\"[11])), sum((actuals_array.\"values\"[12])), sum((actuals_array.\"values\"[13])), sum((actuals_array.\"values\"[14])), sum((actuals_array.\"values\"[15])), sum((actuals_array.\"values\"[16])), sum((actuals_array.\"values\"[17])), sum((actuals_array.\"values\"[18])), sum((actuals_array.\"values\"[19])), sum((actuals_array.\"values\"[20])), sum((actuals_array.\"values\"[21])), sum((actuals_array.\"values\"[22])), sum((actuals_array.\"values\"[23])), sum((actuals_array.\"values\"[24])), sum((actuals_array.\"values\"[25])), sum((actuals_array.\"values\"[26])), sum((actuals_array.\"values\"[27])), sum((actuals_array.\"values\"[28])), sum((actuals_array.\"values\"[29])), sum((actuals_array.\"values\"[30])), sum((actuals_array.\"values\"[31])), sum((actuals_array.\"values\"[32])), sum((actuals_array.\"values\"[33])), sum((actuals_array.\"values\"[34])), sum((actuals_array.\"values\"[35])), sum((actuals_array.\"values\"[36])), sum((actuals_array.\"values\"[37])), sum((actuals_array.\"values\"[38])), sum((actuals_array.\"values\"[39])), sum((actuals_array.\"values\"[40])), sum((actuals_array.\"values\"[41])), sum((actuals_array.\"values\"[42])), sum((actuals_array.\"values\"[43])), sum((actuals_array.\"values\"[44])), sum((actuals_array.\"values\"[45])), sum((actuals_array.\"values\"[46])), sum((actuals_array.\"values\"[47])), sum((actuals_array.\"values\"[48])), sum((actuals_array.\"values\"[49])), sum((actuals_array.\"values\"[50])), sum((actuals_array.\"values\"[51])), sum((actuals_array.\"values\"[52])), sum((actuals_array.\"values\"[53])), sum((actuals_array.\"values\"[54])), sum((actuals_array.\"values\"[55])), sum((actuals_array.\"values\"[56])), sum((actuals_array.\"values\"[57])), sum((actuals_array.\"values\"[58])), sum((actuals_array.\"values\"[59])), sum((actuals_array.\"values\"[60])), sum((actuals_array.\"values\"[61])), sum((actuals_array.\"values\"[62])), sum((actuals_array.\"values\"[63])), sum((actuals_array.\"values\"[64])), sum((actuals_array.\"values\"[65])), sum((actuals_array.\"values\"[66])), sum((actuals_array.\"values\"[67])), sum((actuals_array.\"values\"[68])), sum((actuals_array.\"values\"[69])), sum((actuals_array.\"values\"[70])), sum((actuals_array.\"values\"[71])), sum((actuals_array.\"values\"[72])), sum((actuals_array.\"values\"[73])), sum((actuals_array.\"values\"[74])), sum((actuals_array.\"values\"[75])), sum((actuals_array.\"values\"[76])), sum((actuals_array.\"values\"[77])), sum((actuals_array.\"values\"[78])), sum((actuals_array.\"values\"[79])), sum((actuals_array.\"values\"[80])), sum((actuals_array.\"values\"[81])), sum((actuals_array.\"values\"[82])), sum((actuals_array.\"values\"[83])), sum((actuals_array.\"values\"[84])), sum((actuals_array.\"values\"[85])), sum((actuals_array.\"values\"[86])), sum((actuals_array.\"values\"[87])), sum((actuals_array.\"values\"[88])), sum((actuals_array.\"values\"[89])), sum((actuals_array.\"values\"[90])), sum((actuals_array.\"values\"[91])), sum((actuals_array.\"values\"[92])), sum((actuals_array.\"values\"[93])), sum((actuals_array.\"values\"[94])), sum((actuals_array.\"values\"[95])), sum((actuals_array.\"values\"[96])), sum((actuals_array.\"values\"[97])), sum((actuals_array.\"values\"[98])), sum((actuals_array.\"values\"[99])), sum((actuals_array.\"values\"[100]))\n", " Group Key: actuals_array.day\n", " Buffers: shared hit=6168432 read=13522\n", " -> Unique (cost=0.43..1460603.44 rows=1642496 width=416) (actual time=1496.405..48753.184 rows=5475000 loops=1)\n", " Output: actuals_array.day, (actuals_array.\"values\"[1]), (actuals_array.\"values\"[2]), (actuals_array.\"values\"[3]), (actuals_array.\"values\"[4]), (actuals_array.\"values\"[5]), (actuals_array.\"values\"[6]), (actuals_array.\"values\"[7]), (actuals_array.\"values\"[8]), (actuals_array.\"values\"[9]), (actuals_array.\"values\"[10]), (actuals_array.\"values\"[11]), (actuals_array.\"values\"[12]), (actuals_array.\"values\"[13]), (actuals_array.\"values\"[14]), (actuals_array.\"values\"[15]), (actuals_array.\"values\"[16]), (actuals_array.\"values\"[17]), (actuals_array.\"values\"[18]), (actuals_array.\"values\"[19]), (actuals_array.\"values\"[20]), (actuals_array.\"values\"[21]), (actuals_array.\"values\"[22]), (actuals_array.\"values\"[23]), (actuals_array.\"values\"[24]), (actuals_array.\"values\"[25]), (actuals_array.\"values\"[26]), (actuals_array.\"values\"[27]), (actuals_array.\"values\"[28]), (actuals_array.\"values\"[29]), (actuals_array.\"values\"[30]), (actuals_array.\"values\"[31]), (actuals_array.\"values\"[32]), (actuals_array.\"values\"[33]), (actuals_array.\"values\"[34]), (actuals_array.\"values\"[35]), (actuals_array.\"values\"[36]), (actuals_array.\"values\"[37]), (actuals_array.\"values\"[38]), (actuals_array.\"values\"[39]), (actuals_array.\"values\"[40]), (actuals_array.\"values\"[41]), (actuals_array.\"values\"[42]), (actuals_array.\"values\"[43]), (actuals_array.\"values\"[44]), (actuals_array.\"values\"[45]), (actuals_array.\"values\"[46]), (actuals_array.\"values\"[47]), (actuals_array.\"values\"[48]), (actuals_array.\"values\"[49]), (actuals_array.\"values\"[50]), (actuals_array.\"values\"[51]), (actuals_array.\"values\"[52]), (actuals_array.\"values\"[53]), (actuals_array.\"values\"[54]), (actuals_array.\"values\"[55]), (actuals_array.\"values\"[56]), (actuals_array.\"values\"[57]), (actuals_array.\"values\"[58]), (actuals_array.\"values\"[59]), (actuals_array.\"values\"[60]), (actuals_array.\"values\"[61]), (actuals_array.\"values\"[62]), (actuals_array.\"values\"[63]), (actuals_array.\"values\"[64]), (actuals_array.\"values\"[65]), (actuals_array.\"values\"[66]), (actuals_array.\"values\"[67]), (actuals_array.\"values\"[68]), (actuals_array.\"values\"[69]), (actuals_array.\"values\"[70]), (actuals_array.\"values\"[71]), (actuals_array.\"values\"[72]), (actuals_array.\"values\"[73]), (actuals_array.\"values\"[74]), (actuals_array.\"values\"[75]), (actuals_array.\"values\"[76]), (actuals_array.\"values\"[77]), (actuals_array.\"values\"[78]), (actuals_array.\"values\"[79]), (actuals_array.\"values\"[80]), (actuals_array.\"values\"[81]), (actuals_array.\"values\"[82]), (actuals_array.\"values\"[83]), (actuals_array.\"values\"[84]), (actuals_array.\"values\"[85]), (actuals_array.\"values\"[86]), (actuals_array.\"values\"[87]), (actuals_array.\"values\"[88]), (actuals_array.\"values\"[89]), (actuals_array.\"values\"[90]), (actuals_array.\"values\"[91]), (actuals_array.\"values\"[92]), (actuals_array.\"values\"[93]), (actuals_array.\"values\"[94]), (actuals_array.\"values\"[95]), (actuals_array.\"values\"[96]), (actuals_array.\"values\"[97]), (actuals_array.\"values\"[98]), (actuals_array.\"values\"[99]), (actuals_array.\"values\"[100]), actuals_array.asset_id, actuals_array.created\n", " Buffers: shared hit=6168432 read=13522\n", " -> Index Scan using actuals_array_asset_id_day_created on public.actuals_array (cost=0.43..1378478.65 rows=16424957 width=416) (actual time=1496.402..43312.729 rows=16425000 loops=1)\n", " Output: actuals_array.day, actuals_array.\"values\"[1], actuals_array.\"values\"[2], actuals_array.\"values\"[3], actuals_array.\"values\"[4], actuals_array.\"values\"[5], actuals_array.\"values\"[6], actuals_array.\"values\"[7], actuals_array.\"values\"[8], actuals_array.\"values\"[9], actuals_array.\"values\"[10], actuals_array.\"values\"[11], actuals_array.\"values\"[12], actuals_array.\"values\"[13], actuals_array.\"values\"[14], actuals_array.\"values\"[15], actuals_array.\"values\"[16], actuals_array.\"values\"[17], actuals_array.\"values\"[18], actuals_array.\"values\"[19], actuals_array.\"values\"[20], actuals_array.\"values\"[21], actuals_array.\"values\"[22], actuals_array.\"values\"[23], actuals_array.\"values\"[24], actuals_array.\"values\"[25], actuals_array.\"values\"[26], actuals_array.\"values\"[27], actuals_array.\"values\"[28], actuals_array.\"values\"[29], actuals_array.\"values\"[30], actuals_array.\"values\"[31], actuals_array.\"values\"[32], actuals_array.\"values\"[33], actuals_array.\"values\"[34], actuals_array.\"values\"[35], actuals_array.\"values\"[36], actuals_array.\"values\"[37], actuals_array.\"values\"[38], actuals_array.\"values\"[39], actuals_array.\"values\"[40], actuals_array.\"values\"[41], actuals_array.\"values\"[42], actuals_array.\"values\"[43], actuals_array.\"values\"[44], actuals_array.\"values\"[45], actuals_array.\"values\"[46], actuals_array.\"values\"[47], actuals_array.\"values\"[48], actuals_array.\"values\"[49], actuals_array.\"values\"[50], actuals_array.\"values\"[51], actuals_array.\"values\"[52], actuals_array.\"values\"[53], actuals_array.\"values\"[54], actuals_array.\"values\"[55], actuals_array.\"values\"[56], actuals_array.\"values\"[57], actuals_array.\"values\"[58], actuals_array.\"values\"[59], actuals_array.\"values\"[60], actuals_array.\"values\"[61], actuals_array.\"values\"[62], actuals_array.\"values\"[63], actuals_array.\"values\"[64], actuals_array.\"values\"[65], actuals_array.\"values\"[66], actuals_array.\"values\"[67], actuals_array.\"values\"[68], actuals_array.\"values\"[69], actuals_array.\"values\"[70], actuals_array.\"values\"[71], actuals_array.\"values\"[72], actuals_array.\"values\"[73], actuals_array.\"values\"[74], actuals_array.\"values\"[75], actuals_array.\"values\"[76], actuals_array.\"values\"[77], actuals_array.\"values\"[78], actuals_array.\"values\"[79], actuals_array.\"values\"[80], actuals_array.\"values\"[81], actuals_array.\"values\"[82], actuals_array.\"values\"[83], actuals_array.\"values\"[84], actuals_array.\"values\"[85], actuals_array.\"values\"[86], actuals_array.\"values\"[87], actuals_array.\"values\"[88], actuals_array.\"values\"[89], actuals_array.\"values\"[90], actuals_array.\"values\"[91], actuals_array.\"values\"[92], actuals_array.\"values\"[93], actuals_array.\"values\"[94], actuals_array.\"values\"[95], actuals_array.\"values\"[96], actuals_array.\"values\"[97], actuals_array.\"values\"[98], actuals_array.\"values\"[99], actuals_array.\"values\"[100], actuals_array.asset_id, actuals_array.created\n", " Buffers: shared hit=6168432 read=13522\n", "Planning Time: 0.416 ms\n", "JIT:\n", " Functions: 10\n", " Options: Inlining true, Optimization true, Expressions true, Deforming true\n", " Timing: Generation 5.049 ms, Inlining 9.983 ms, Optimization 635.689 ms, Emission 849.568 ms, Total 1500.288 ms\n", "Execution Time: 56305.184 ms`,\n", " query: `\n", " EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)\n", "\n", " SELECT\n", " day,\n", " COUNT(*) as count_asset,\n", " SUM(t00), SUM(t01), SUM(t02), SUM(t03), SUM(t04), SUM(t05), SUM(t06), SUM(t07), SUM(t08), SUM(t09), SUM(t10), SUM(t11), SUM(t12), SUM(t13), SUM(t14), SUM(t15), SUM(t16), SUM(t17), SUM(t18), SUM(t19), SUM(t20), SUM(t21), SUM(t22), SUM(t23), SUM(t24), SUM(t25), SUM(t26), SUM(t27), SUM(t28), SUM(t29), SUM(t30), SUM(t31), SUM(t32), SUM(t33), SUM(t34), SUM(t35), SUM(t36), SUM(t37), SUM(t38), SUM(t39), SUM(t40), SUM(t41), SUM(t42), SUM(t43), SUM(t44), SUM(t45), SUM(t46), SUM(t47), SUM(t48), SUM(t49), SUM(t50), SUM(t51), SUM(t52), SUM(t53), SUM(t54), SUM(t55), SUM(t56), SUM(t57), SUM(t58), SUM(t59), SUM(t60), SUM(t61), SUM(t62), SUM(t63), SUM(t64), SUM(t65), SUM(t66), SUM(t67), SUM(t68), SUM(t69), SUM(t70), SUM(t71), SUM(t72), SUM(t73), SUM(t74), SUM(t75), SUM(t76), SUM(t77), SUM(t78), SUM(t79), SUM(t80), SUM(t81), SUM(t82), SUM(t83), SUM(t84), SUM(t85), SUM(t86), SUM(t87), SUM(t88), SUM(t89), SUM(t90), SUM(t91), SUM(t92), SUM(t93), SUM(t94), SUM(t95), SUM(t96), SUM(t97), SUM(t98), SUM(t99)\n", "\n", " FROM (\n", " -- Select only latest rows\n", " SELECT DISTINCT ON (asset_id, day)\n", " day,\n", " values[1] as t00, values[2] as t01, values[3] as t02, values[4] as t03, values[5] as t04, values[6] as t05, values[7] as t06, values[8] as t07, values[9] as t08, values[10] as t09, values[11] as t10, values[12] as t11, values[13] as t12, values[14] as t13, values[15] as t14, values[16] as t15, values[17] as t16, values[18] as t17, values[19] as t18, values[20] as t19, values[21] as t20, values[22] as t21, values[23] as t22, values[24] as t23, values[25] as t24, values[26] as t25, values[27] as t26, values[28] as t27, values[29] as t28, values[30] as t29, values[31] as t30, values[32] as t31, values[33] as t32, values[34] as t33, values[35] as t34, values[36] as t35, values[37] as t36, values[38] as t37, values[39] as t38, values[40] as t39, values[41] as t40, values[42] as t41, values[43] as t42, values[44] as t43, values[45] as t44, values[46] as t45, values[47] as t46, values[48] as t47, values[49] as t48, values[50] as t49, values[51] as t50, values[52] as t51, values[53] as t52, values[54] as t53, values[55] as t54, values[56] as t55, values[57] as t56, values[58] as t57, values[59] as t58, values[60] as t59, values[61] as t60, values[62] as t61, values[63] as t62, values[64] as t63, values[65] as t64, values[66] as t65, values[67] as t66, values[68] as t67, values[69] as t68, values[70] as t69, values[71] as t70, values[72] as t71, values[73] as t72, values[74] as t73, values[75] as t74, values[76] as t75, values[77] as t76, values[78] as t77, values[79] as t78, values[80] as t79, values[81] as t80, values[82] as t81, values[83] as t82, values[84] as t83, values[85] as t84, values[86] as t85, values[87] as t86, values[88] as t87, values[89] as t88, values[90] as t89, values[91] as t90, values[92] as t91, values[93] as t92, values[94] as t93, values[95] as t94, values[96] as t95, values[97] as t96, values[98] as t97, values[99] as t98, values[100] as t99\n", " FROM actuals_array\n", " ORDER BY asset_id, day, created DESC NULLS LAST\n", " ) AS t\n", "\n", " GROUP BY day;\n", " `,\n", " }\n", " },\n", " template: \"\"\n", " });\n", " " ], "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def query_array_subscript(table='actuals_array'):\n", " with conn.cursor() as curs:\n", " # Maximum amount of columns\n", " n_cols = 100\n", " array_index_str = \", \".join(f'values[{i+1}] as t{i:02d}' for i in range(n_cols))\n", " array_sum_str = \", \".join(f\"SUM(t{i:02d})\" for i in range(n_cols))\n", " query = f\"\"\"\n", " EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)\n", "\n", " SELECT\n", " day,\n", " COUNT(*) as count_asset,\n", " {array_sum_str}\n", "\n", " FROM (\n", " -- Select only latest rows\n", " SELECT DISTINCT ON (asset_id, day)\n", " day,\n", " {array_index_str}\n", " FROM {table}\n", " ORDER BY asset_id, day, created DESC NULLS LAST\n", " ) AS t\n", "\n", " GROUP BY day;\n", " \"\"\"\n", " curs.execute(query)\n", " plan = \"\\n\".join([l[0] for l in curs.fetchall()])\n", " \n", " return query, plan\n", "\n", "pev2(*query_array_subscript())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Query 3: Pre-made columns\n", "\n", "That went a lot better! 3x speedup, that's not bad. Now we have to try the obvious next step: pre-create these columns and query that." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "\n", " var el = document.createElement('div');\n", " element.append(el);\n", "\n", " var v = new Vue({\n", " el: el,\n", " components: {\n", " 'pev2': window.pev2\n", " },\n", " data: function() {\n", " return {\n", " plan: `HashAggregate (cost=1836983.95..1836985.95 rows=200 width=412) (actual time=26341.127..26342.245 rows=1095 loops=1)\n", " Output: actuals_cols.day, count(*), sum(actuals_cols.pte00), sum(actuals_cols.pte01), sum(actuals_cols.pte02), sum(actuals_cols.pte03), sum(actuals_cols.pte04), sum(actuals_cols.pte05), sum(actuals_cols.pte06), sum(actuals_cols.pte07), sum(actuals_cols.pte08), sum(actuals_cols.pte09), sum(actuals_cols.pte10), sum(actuals_cols.pte11), sum(actuals_cols.pte12), sum(actuals_cols.pte13), sum(actuals_cols.pte14), sum(actuals_cols.pte15), sum(actuals_cols.pte16), sum(actuals_cols.pte17), sum(actuals_cols.pte18), sum(actuals_cols.pte19), sum(actuals_cols.pte20), sum(actuals_cols.pte21), sum(actuals_cols.pte22), sum(actuals_cols.pte23), sum(actuals_cols.pte24), sum(actuals_cols.pte25), sum(actuals_cols.pte26), sum(actuals_cols.pte27), sum(actuals_cols.pte28), sum(actuals_cols.pte29), sum(actuals_cols.pte30), sum(actuals_cols.pte31), sum(actuals_cols.pte32), sum(actuals_cols.pte33), sum(actuals_cols.pte34), sum(actuals_cols.pte35), sum(actuals_cols.pte36), sum(actuals_cols.pte37), sum(actuals_cols.pte38), sum(actuals_cols.pte39), sum(actuals_cols.pte40), sum(actuals_cols.pte41), sum(actuals_cols.pte42), sum(actuals_cols.pte43), sum(actuals_cols.pte44), sum(actuals_cols.pte45), sum(actuals_cols.pte46), sum(actuals_cols.pte47), sum(actuals_cols.pte48), sum(actuals_cols.pte49), sum(actuals_cols.pte50), sum(actuals_cols.pte51), sum(actuals_cols.pte52), sum(actuals_cols.pte53), sum(actuals_cols.pte54), sum(actuals_cols.pte55), sum(actuals_cols.pte56), sum(actuals_cols.pte57), sum(actuals_cols.pte58), sum(actuals_cols.pte59), sum(actuals_cols.pte60), sum(actuals_cols.pte61), sum(actuals_cols.pte62), sum(actuals_cols.pte63), sum(actuals_cols.pte64), sum(actuals_cols.pte65), sum(actuals_cols.pte66), sum(actuals_cols.pte67), sum(actuals_cols.pte68), sum(actuals_cols.pte69), sum(actuals_cols.pte70), sum(actuals_cols.pte71), sum(actuals_cols.pte72), sum(actuals_cols.pte73), sum(actuals_cols.pte74), sum(actuals_cols.pte75), sum(actuals_cols.pte76), sum(actuals_cols.pte77), sum(actuals_cols.pte78), sum(actuals_cols.pte79), sum(actuals_cols.pte80), sum(actuals_cols.pte81), sum(actuals_cols.pte82), sum(actuals_cols.pte83), sum(actuals_cols.pte84), sum(actuals_cols.pte85), sum(actuals_cols.pte86), sum(actuals_cols.pte87), sum(actuals_cols.pte88), sum(actuals_cols.pte89), sum(actuals_cols.pte90), sum(actuals_cols.pte91), sum(actuals_cols.pte92), sum(actuals_cols.pte93), sum(actuals_cols.pte94), sum(actuals_cols.pte95), sum(actuals_cols.pte96), sum(actuals_cols.pte97), sum(actuals_cols.pte98), sum(actuals_cols.pte99)\n", " Group Key: actuals_cols.day\n", " Buffers: shared hit=4561770 read=976164\n", " -> Result (cost=0.43..1401721.45 rows=1642500 width=416) (actual time=1136.308..20291.101 rows=5475000 loops=1)\n", " Output: actuals_cols.day, actuals_cols.pte00, actuals_cols.pte01, actuals_cols.pte02, actuals_cols.pte03, actuals_cols.pte04, actuals_cols.pte05, actuals_cols.pte06, actuals_cols.pte07, actuals_cols.pte08, actuals_cols.pte09, actuals_cols.pte10, actuals_cols.pte11, actuals_cols.pte12, actuals_cols.pte13, actuals_cols.pte14, actuals_cols.pte15, actuals_cols.pte16, actuals_cols.pte17, actuals_cols.pte18, actuals_cols.pte19, actuals_cols.pte20, actuals_cols.pte21, actuals_cols.pte22, actuals_cols.pte23, actuals_cols.pte24, actuals_cols.pte25, actuals_cols.pte26, actuals_cols.pte27, actuals_cols.pte28, actuals_cols.pte29, actuals_cols.pte30, actuals_cols.pte31, actuals_cols.pte32, actuals_cols.pte33, actuals_cols.pte34, actuals_cols.pte35, actuals_cols.pte36, actuals_cols.pte37, actuals_cols.pte38, actuals_cols.pte39, actuals_cols.pte40, actuals_cols.pte41, actuals_cols.pte42, actuals_cols.pte43, actuals_cols.pte44, actuals_cols.pte45, actuals_cols.pte46, actuals_cols.pte47, actuals_cols.pte48, actuals_cols.pte49, actuals_cols.pte50, actuals_cols.pte51, actuals_cols.pte52, actuals_cols.pte53, actuals_cols.pte54, actuals_cols.pte55, actuals_cols.pte56, actuals_cols.pte57, actuals_cols.pte58, actuals_cols.pte59, actuals_cols.pte60, actuals_cols.pte61, actuals_cols.pte62, actuals_cols.pte63, actuals_cols.pte64, actuals_cols.pte65, actuals_cols.pte66, actuals_cols.pte67, actuals_cols.pte68, actuals_cols.pte69, actuals_cols.pte70, actuals_cols.pte71, actuals_cols.pte72, actuals_cols.pte73, actuals_cols.pte74, actuals_cols.pte75, actuals_cols.pte76, actuals_cols.pte77, actuals_cols.pte78, actuals_cols.pte79, actuals_cols.pte80, actuals_cols.pte81, actuals_cols.pte82, actuals_cols.pte83, actuals_cols.pte84, actuals_cols.pte85, actuals_cols.pte86, actuals_cols.pte87, actuals_cols.pte88, actuals_cols.pte89, actuals_cols.pte90, actuals_cols.pte91, actuals_cols.pte92, actuals_cols.pte93, actuals_cols.pte94, actuals_cols.pte95, actuals_cols.pte96, actuals_cols.pte97, actuals_cols.pte98, actuals_cols.pte99, actuals_cols.asset_id, actuals_cols.created\n", " Buffers: shared hit=4561770 read=976164\n", " -> Unique (cost=0.43..1401721.45 rows=1642500 width=416) (actual time=0.982..17231.784 rows=5475000 loops=1)\n", " Output: actuals_cols.id, actuals_cols.asset_id, actuals_cols.day, actuals_cols.created, actuals_cols.pte00, actuals_cols.pte01, actuals_cols.pte02, actuals_cols.pte03, actuals_cols.pte04, actuals_cols.pte05, actuals_cols.pte06, actuals_cols.pte07, actuals_cols.pte08, actuals_cols.pte09, actuals_cols.pte10, actuals_cols.pte11, actuals_cols.pte12, actuals_cols.pte13, actuals_cols.pte14, actuals_cols.pte15, actuals_cols.pte16, actuals_cols.pte17, actuals_cols.pte18, actuals_cols.pte19, actuals_cols.pte20, actuals_cols.pte21, actuals_cols.pte22, actuals_cols.pte23, actuals_cols.pte24, actuals_cols.pte25, actuals_cols.pte26, actuals_cols.pte27, actuals_cols.pte28, actuals_cols.pte29, actuals_cols.pte30, actuals_cols.pte31, actuals_cols.pte32, actuals_cols.pte33, actuals_cols.pte34, actuals_cols.pte35, actuals_cols.pte36, actuals_cols.pte37, actuals_cols.pte38, actuals_cols.pte39, actuals_cols.pte40, actuals_cols.pte41, actuals_cols.pte42, actuals_cols.pte43, actuals_cols.pte44, actuals_cols.pte45, actuals_cols.pte46, actuals_cols.pte47, actuals_cols.pte48, actuals_cols.pte49, actuals_cols.pte50, actuals_cols.pte51, actuals_cols.pte52, actuals_cols.pte53, actuals_cols.pte54, actuals_cols.pte55, actuals_cols.pte56, actuals_cols.pte57, actuals_cols.pte58, actuals_cols.pte59, actuals_cols.pte60, actuals_cols.pte61, actuals_cols.pte62, actuals_cols.pte63, actuals_cols.pte64, actuals_cols.pte65, actuals_cols.pte66, actuals_cols.pte67, actuals_cols.pte68, actuals_cols.pte69, actuals_cols.pte70, actuals_cols.pte71, actuals_cols.pte72, actuals_cols.pte73, actuals_cols.pte74, actuals_cols.pte75, actuals_cols.pte76, actuals_cols.pte77, actuals_cols.pte78, actuals_cols.pte79, actuals_cols.pte80, actuals_cols.pte81, actuals_cols.pte82, actuals_cols.pte83, actuals_cols.pte84, actuals_cols.pte85, actuals_cols.pte86, actuals_cols.pte87, actuals_cols.pte88, actuals_cols.pte89, actuals_cols.pte90, actuals_cols.pte91, actuals_cols.pte92, actuals_cols.pte93, actuals_cols.pte94, actuals_cols.pte95, actuals_cols.pte96, actuals_cols.pte97, actuals_cols.pte98, actuals_cols.pte99\n", " Buffers: shared hit=4561770 read=976164\n", " -> Index Scan using actuals_cols_asset_id_day_created on public.actuals_cols (cost=0.43..1319596.45 rows=16425000 width=416) (actual time=0.980..15243.832 rows=16425000 loops=1)\n", " Output: actuals_cols.id, actuals_cols.asset_id, actuals_cols.day, actuals_cols.created, actuals_cols.pte00, actuals_cols.pte01, actuals_cols.pte02, actuals_cols.pte03, actuals_cols.pte04, actuals_cols.pte05, actuals_cols.pte06, actuals_cols.pte07, actuals_cols.pte08, actuals_cols.pte09, actuals_cols.pte10, actuals_cols.pte11, actuals_cols.pte12, actuals_cols.pte13, actuals_cols.pte14, actuals_cols.pte15, actuals_cols.pte16, actuals_cols.pte17, actuals_cols.pte18, actuals_cols.pte19, actuals_cols.pte20, actuals_cols.pte21, actuals_cols.pte22, actuals_cols.pte23, actuals_cols.pte24, actuals_cols.pte25, actuals_cols.pte26, actuals_cols.pte27, actuals_cols.pte28, actuals_cols.pte29, actuals_cols.pte30, actuals_cols.pte31, actuals_cols.pte32, actuals_cols.pte33, actuals_cols.pte34, actuals_cols.pte35, actuals_cols.pte36, actuals_cols.pte37, actuals_cols.pte38, actuals_cols.pte39, actuals_cols.pte40, actuals_cols.pte41, actuals_cols.pte42, actuals_cols.pte43, actuals_cols.pte44, actuals_cols.pte45, actuals_cols.pte46, actuals_cols.pte47, actuals_cols.pte48, actuals_cols.pte49, actuals_cols.pte50, actuals_cols.pte51, actuals_cols.pte52, actuals_cols.pte53, actuals_cols.pte54, actuals_cols.pte55, actuals_cols.pte56, actuals_cols.pte57, actuals_cols.pte58, actuals_cols.pte59, actuals_cols.pte60, actuals_cols.pte61, actuals_cols.pte62, actuals_cols.pte63, actuals_cols.pte64, actuals_cols.pte65, actuals_cols.pte66, actuals_cols.pte67, actuals_cols.pte68, actuals_cols.pte69, actuals_cols.pte70, actuals_cols.pte71, actuals_cols.pte72, actuals_cols.pte73, actuals_cols.pte74, actuals_cols.pte75, actuals_cols.pte76, actuals_cols.pte77, actuals_cols.pte78, actuals_cols.pte79, actuals_cols.pte80, actuals_cols.pte81, actuals_cols.pte82, actuals_cols.pte83, actuals_cols.pte84, actuals_cols.pte85, actuals_cols.pte86, actuals_cols.pte87, actuals_cols.pte88, actuals_cols.pte89, actuals_cols.pte90, actuals_cols.pte91, actuals_cols.pte92, actuals_cols.pte93, actuals_cols.pte94, actuals_cols.pte95, actuals_cols.pte96, actuals_cols.pte97, actuals_cols.pte98, actuals_cols.pte99\n", " Buffers: shared hit=4561770 read=976164\n", "Planning Time: 1.375 ms\n", "JIT:\n", " Functions: 8\n", " Options: Inlining true, Optimization true, Expressions true, Deforming true\n", " Timing: Generation 4.799 ms, Inlining 8.034 ms, Optimization 425.374 ms, Emission 700.315 ms, Total 1138.522 ms\n", "Execution Time: 26347.366 ms`,\n", " query: `\n", " EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)\n", "\n", " SELECT\n", " day,\n", " COUNT(*) as count_asset,\n", " SUM(pte00), SUM(pte01), SUM(pte02), SUM(pte03), SUM(pte04), SUM(pte05), SUM(pte06), SUM(pte07), SUM(pte08), SUM(pte09), SUM(pte10), SUM(pte11), SUM(pte12), SUM(pte13), SUM(pte14), SUM(pte15), SUM(pte16), SUM(pte17), SUM(pte18), SUM(pte19), SUM(pte20), SUM(pte21), SUM(pte22), SUM(pte23), SUM(pte24), SUM(pte25), SUM(pte26), SUM(pte27), SUM(pte28), SUM(pte29), SUM(pte30), SUM(pte31), SUM(pte32), SUM(pte33), SUM(pte34), SUM(pte35), SUM(pte36), SUM(pte37), SUM(pte38), SUM(pte39), SUM(pte40), SUM(pte41), SUM(pte42), SUM(pte43), SUM(pte44), SUM(pte45), SUM(pte46), SUM(pte47), SUM(pte48), SUM(pte49), SUM(pte50), SUM(pte51), SUM(pte52), SUM(pte53), SUM(pte54), SUM(pte55), SUM(pte56), SUM(pte57), SUM(pte58), SUM(pte59), SUM(pte60), SUM(pte61), SUM(pte62), SUM(pte63), SUM(pte64), SUM(pte65), SUM(pte66), SUM(pte67), SUM(pte68), SUM(pte69), SUM(pte70), SUM(pte71), SUM(pte72), SUM(pte73), SUM(pte74), SUM(pte75), SUM(pte76), SUM(pte77), SUM(pte78), SUM(pte79), SUM(pte80), SUM(pte81), SUM(pte82), SUM(pte83), SUM(pte84), SUM(pte85), SUM(pte86), SUM(pte87), SUM(pte88), SUM(pte89), SUM(pte90), SUM(pte91), SUM(pte92), SUM(pte93), SUM(pte94), SUM(pte95), SUM(pte96), SUM(pte97), SUM(pte98), SUM(pte99)\n", " FROM (\n", " -- Select only latest rows\n", " SELECT DISTINCT ON (asset_id, day)\n", " day,\n", " pte00, pte01, pte02, pte03, pte04, pte05, pte06, pte07, pte08, pte09, pte10, pte11, pte12, pte13, pte14, pte15, pte16, pte17, pte18, pte19, pte20, pte21, pte22, pte23, pte24, pte25, pte26, pte27, pte28, pte29, pte30, pte31, pte32, pte33, pte34, pte35, pte36, pte37, pte38, pte39, pte40, pte41, pte42, pte43, pte44, pte45, pte46, pte47, pte48, pte49, pte50, pte51, pte52, pte53, pte54, pte55, pte56, pte57, pte58, pte59, pte60, pte61, pte62, pte63, pte64, pte65, pte66, pte67, pte68, pte69, pte70, pte71, pte72, pte73, pte74, pte75, pte76, pte77, pte78, pte79, pte80, pte81, pte82, pte83, pte84, pte85, pte86, pte87, pte88, pte89, pte90, pte91, pte92, pte93, pte94, pte95, pte96, pte97, pte98, pte99\n", " FROM actuals_cols\n", " ORDER BY asset_id, day, created DESC NULLS LAST\n", " ) AS t\n", " GROUP BY day;\n", " `,\n", " }\n", " },\n", " template: \"\"\n", " });\n", " " ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with conn.cursor() as curs:\n", " table = 'actuals_cols'\n", " n_cols = 100\n", " col_str = \", \".join(f'pte{i:02d}' for i in range(n_cols))\n", " array_sum_str = \", \".join(f\"SUM(pte{i:02d})\" for i in range(n_cols))\n", " query = f\"\"\"\n", " EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)\n", "\n", " SELECT\n", " day,\n", " COUNT(*) as count_asset,\n", " {array_sum_str}\n", " FROM (\n", " -- Select only latest rows\n", " SELECT DISTINCT ON (asset_id, day)\n", " day,\n", " {col_str}\n", " FROM {table}\n", " ORDER BY asset_id, day, created DESC NULLS LAST\n", " ) AS t\n", " GROUP BY day;\n", " \"\"\"\n", " curs.execute(query)\n", " plan = \"\\n\".join([l[0] for l in curs.fetchall()])\n", "\n", "pev2(query, plan)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Woah! That went even better, 4x speedup (12x in total since our first try). That is pretty amazing. This format is more cumbersome to work with, though, than the nice arrays. However, with some code generation in Python it's not that bad. We do have to accomodate the maximum number of columns, which is 100 for 15-minute periods in localtime (usually 96 periods but once a year 92 and once 100 due to summertime)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bonus query: DOUBLE PRECISION\n", "\n", "I'm interested to see what the difference is between querying the double and single precision numeric types, so I'll re-execute the array subscripting-based query from before and see if it makes a big difference." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "\n", " var el = document.createElement('div');\n", " element.append(el);\n", "\n", " var v = new Vue({\n", " el: el,\n", " components: {\n", " 'pev2': window.pev2\n", " },\n", " data: function() {\n", " return {\n", " plan: `HashAggregate (cost=2840179.84..2840181.84 rows=200 width=812) (actual time=75240.182..75241.432 rows=1095 loops=1)\n", " Output: actuals_array_double.day, count(*), sum((actuals_array_double.\"values\"[1])), sum((actuals_array_double.\"values\"[2])), sum((actuals_array_double.\"values\"[3])), sum((actuals_array_double.\"values\"[4])), sum((actuals_array_double.\"values\"[5])), sum((actuals_array_double.\"values\"[6])), sum((actuals_array_double.\"values\"[7])), sum((actuals_array_double.\"values\"[8])), sum((actuals_array_double.\"values\"[9])), sum((actuals_array_double.\"values\"[10])), sum((actuals_array_double.\"values\"[11])), sum((actuals_array_double.\"values\"[12])), sum((actuals_array_double.\"values\"[13])), sum((actuals_array_double.\"values\"[14])), sum((actuals_array_double.\"values\"[15])), sum((actuals_array_double.\"values\"[16])), sum((actuals_array_double.\"values\"[17])), sum((actuals_array_double.\"values\"[18])), sum((actuals_array_double.\"values\"[19])), sum((actuals_array_double.\"values\"[20])), sum((actuals_array_double.\"values\"[21])), sum((actuals_array_double.\"values\"[22])), sum((actuals_array_double.\"values\"[23])), sum((actuals_array_double.\"values\"[24])), sum((actuals_array_double.\"values\"[25])), sum((actuals_array_double.\"values\"[26])), sum((actuals_array_double.\"values\"[27])), sum((actuals_array_double.\"values\"[28])), sum((actuals_array_double.\"values\"[29])), sum((actuals_array_double.\"values\"[30])), sum((actuals_array_double.\"values\"[31])), sum((actuals_array_double.\"values\"[32])), sum((actuals_array_double.\"values\"[33])), sum((actuals_array_double.\"values\"[34])), sum((actuals_array_double.\"values\"[35])), sum((actuals_array_double.\"values\"[36])), sum((actuals_array_double.\"values\"[37])), sum((actuals_array_double.\"values\"[38])), sum((actuals_array_double.\"values\"[39])), sum((actuals_array_double.\"values\"[40])), sum((actuals_array_double.\"values\"[41])), sum((actuals_array_double.\"values\"[42])), sum((actuals_array_double.\"values\"[43])), sum((actuals_array_double.\"values\"[44])), sum((actuals_array_double.\"values\"[45])), sum((actuals_array_double.\"values\"[46])), sum((actuals_array_double.\"values\"[47])), sum((actuals_array_double.\"values\"[48])), sum((actuals_array_double.\"values\"[49])), sum((actuals_array_double.\"values\"[50])), sum((actuals_array_double.\"values\"[51])), sum((actuals_array_double.\"values\"[52])), sum((actuals_array_double.\"values\"[53])), sum((actuals_array_double.\"values\"[54])), sum((actuals_array_double.\"values\"[55])), sum((actuals_array_double.\"values\"[56])), sum((actuals_array_double.\"values\"[57])), sum((actuals_array_double.\"values\"[58])), sum((actuals_array_double.\"values\"[59])), sum((actuals_array_double.\"values\"[60])), sum((actuals_array_double.\"values\"[61])), sum((actuals_array_double.\"values\"[62])), sum((actuals_array_double.\"values\"[63])), sum((actuals_array_double.\"values\"[64])), sum((actuals_array_double.\"values\"[65])), sum((actuals_array_double.\"values\"[66])), sum((actuals_array_double.\"values\"[67])), sum((actuals_array_double.\"values\"[68])), sum((actuals_array_double.\"values\"[69])), sum((actuals_array_double.\"values\"[70])), sum((actuals_array_double.\"values\"[71])), sum((actuals_array_double.\"values\"[72])), sum((actuals_array_double.\"values\"[73])), sum((actuals_array_double.\"values\"[74])), sum((actuals_array_double.\"values\"[75])), sum((actuals_array_double.\"values\"[76])), sum((actuals_array_double.\"values\"[77])), sum((actuals_array_double.\"values\"[78])), sum((actuals_array_double.\"values\"[79])), sum((actuals_array_double.\"values\"[80])), sum((actuals_array_double.\"values\"[81])), sum((actuals_array_double.\"values\"[82])), sum((actuals_array_double.\"values\"[83])), sum((actuals_array_double.\"values\"[84])), sum((actuals_array_double.\"values\"[85])), sum((actuals_array_double.\"values\"[86])), sum((actuals_array_double.\"values\"[87])), sum((actuals_array_double.\"values\"[88])), sum((actuals_array_double.\"values\"[89])), sum((actuals_array_double.\"values\"[90])), sum((actuals_array_double.\"values\"[91])), sum((actuals_array_double.\"values\"[92])), sum((actuals_array_double.\"values\"[93])), sum((actuals_array_double.\"values\"[94])), sum((actuals_array_double.\"values\"[95])), sum((actuals_array_double.\"values\"[96])), sum((actuals_array_double.\"values\"[97])), sum((actuals_array_double.\"values\"[98])), sum((actuals_array_double.\"values\"[99])), sum((actuals_array_double.\"values\"[100]))\n", " Group Key: actuals_array_double.day\n", " Buffers: shared hit=3986038 read=1888230\n", " -> Unique (cost=0.43..2404920.25 rows=1642489 width=816) (actual time=1494.886..66328.123 rows=5475000 loops=1)\n", " Output: actuals_array_double.day, (actuals_array_double.\"values\"[1]), (actuals_array_double.\"values\"[2]), (actuals_array_double.\"values\"[3]), (actuals_array_double.\"values\"[4]), (actuals_array_double.\"values\"[5]), (actuals_array_double.\"values\"[6]), (actuals_array_double.\"values\"[7]), (actuals_array_double.\"values\"[8]), (actuals_array_double.\"values\"[9]), (actuals_array_double.\"values\"[10]), (actuals_array_double.\"values\"[11]), (actuals_array_double.\"values\"[12]), (actuals_array_double.\"values\"[13]), (actuals_array_double.\"values\"[14]), (actuals_array_double.\"values\"[15]), (actuals_array_double.\"values\"[16]), (actuals_array_double.\"values\"[17]), (actuals_array_double.\"values\"[18]), (actuals_array_double.\"values\"[19]), (actuals_array_double.\"values\"[20]), (actuals_array_double.\"values\"[21]), (actuals_array_double.\"values\"[22]), (actuals_array_double.\"values\"[23]), (actuals_array_double.\"values\"[24]), (actuals_array_double.\"values\"[25]), (actuals_array_double.\"values\"[26]), (actuals_array_double.\"values\"[27]), (actuals_array_double.\"values\"[28]), (actuals_array_double.\"values\"[29]), (actuals_array_double.\"values\"[30]), (actuals_array_double.\"values\"[31]), (actuals_array_double.\"values\"[32]), (actuals_array_double.\"values\"[33]), (actuals_array_double.\"values\"[34]), (actuals_array_double.\"values\"[35]), (actuals_array_double.\"values\"[36]), (actuals_array_double.\"values\"[37]), (actuals_array_double.\"values\"[38]), (actuals_array_double.\"values\"[39]), (actuals_array_double.\"values\"[40]), (actuals_array_double.\"values\"[41]), (actuals_array_double.\"values\"[42]), (actuals_array_double.\"values\"[43]), (actuals_array_double.\"values\"[44]), (actuals_array_double.\"values\"[45]), (actuals_array_double.\"values\"[46]), (actuals_array_double.\"values\"[47]), (actuals_array_double.\"values\"[48]), (actuals_array_double.\"values\"[49]), (actuals_array_double.\"values\"[50]), (actuals_array_double.\"values\"[51]), (actuals_array_double.\"values\"[52]), (actuals_array_double.\"values\"[53]), (actuals_array_double.\"values\"[54]), (actuals_array_double.\"values\"[55]), (actuals_array_double.\"values\"[56]), (actuals_array_double.\"values\"[57]), (actuals_array_double.\"values\"[58]), (actuals_array_double.\"values\"[59]), (actuals_array_double.\"values\"[60]), (actuals_array_double.\"values\"[61]), (actuals_array_double.\"values\"[62]), (actuals_array_double.\"values\"[63]), (actuals_array_double.\"values\"[64]), (actuals_array_double.\"values\"[65]), (actuals_array_double.\"values\"[66]), (actuals_array_double.\"values\"[67]), (actuals_array_double.\"values\"[68]), (actuals_array_double.\"values\"[69]), (actuals_array_double.\"values\"[70]), (actuals_array_double.\"values\"[71]), (actuals_array_double.\"values\"[72]), (actuals_array_double.\"values\"[73]), (actuals_array_double.\"values\"[74]), (actuals_array_double.\"values\"[75]), (actuals_array_double.\"values\"[76]), (actuals_array_double.\"values\"[77]), (actuals_array_double.\"values\"[78]), (actuals_array_double.\"values\"[79]), (actuals_array_double.\"values\"[80]), (actuals_array_double.\"values\"[81]), (actuals_array_double.\"values\"[82]), (actuals_array_double.\"values\"[83]), (actuals_array_double.\"values\"[84]), (actuals_array_double.\"values\"[85]), (actuals_array_double.\"values\"[86]), (actuals_array_double.\"values\"[87]), (actuals_array_double.\"values\"[88]), (actuals_array_double.\"values\"[89]), (actuals_array_double.\"values\"[90]), (actuals_array_double.\"values\"[91]), (actuals_array_double.\"values\"[92]), (actuals_array_double.\"values\"[93]), (actuals_array_double.\"values\"[94]), (actuals_array_double.\"values\"[95]), (actuals_array_double.\"values\"[96]), (actuals_array_double.\"values\"[97]), (actuals_array_double.\"values\"[98]), (actuals_array_double.\"values\"[99]), (actuals_array_double.\"values\"[100]), actuals_array_double.asset_id, actuals_array_double.created\n", " Buffers: shared hit=3986038 read=1888230\n", " -> Index Scan using actuals_array_double_asset_id_day_created on public.actuals_array_double (cost=0.43..2322795.81 rows=16424888 width=816) (actual time=1494.881..59227.131 rows=16425000 loops=1)\n", " Output: actuals_array_double.day, actuals_array_double.\"values\"[1], actuals_array_double.\"values\"[2], actuals_array_double.\"values\"[3], actuals_array_double.\"values\"[4], actuals_array_double.\"values\"[5], actuals_array_double.\"values\"[6], actuals_array_double.\"values\"[7], actuals_array_double.\"values\"[8], actuals_array_double.\"values\"[9], actuals_array_double.\"values\"[10], actuals_array_double.\"values\"[11], actuals_array_double.\"values\"[12], actuals_array_double.\"values\"[13], actuals_array_double.\"values\"[14], actuals_array_double.\"values\"[15], actuals_array_double.\"values\"[16], actuals_array_double.\"values\"[17], actuals_array_double.\"values\"[18], actuals_array_double.\"values\"[19], actuals_array_double.\"values\"[20], actuals_array_double.\"values\"[21], actuals_array_double.\"values\"[22], actuals_array_double.\"values\"[23], actuals_array_double.\"values\"[24], actuals_array_double.\"values\"[25], actuals_array_double.\"values\"[26], actuals_array_double.\"values\"[27], actuals_array_double.\"values\"[28], actuals_array_double.\"values\"[29], actuals_array_double.\"values\"[30], actuals_array_double.\"values\"[31], actuals_array_double.\"values\"[32], actuals_array_double.\"values\"[33], actuals_array_double.\"values\"[34], actuals_array_double.\"values\"[35], actuals_array_double.\"values\"[36], actuals_array_double.\"values\"[37], actuals_array_double.\"values\"[38], actuals_array_double.\"values\"[39], actuals_array_double.\"values\"[40], actuals_array_double.\"values\"[41], actuals_array_double.\"values\"[42], actuals_array_double.\"values\"[43], actuals_array_double.\"values\"[44], actuals_array_double.\"values\"[45], actuals_array_double.\"values\"[46], actuals_array_double.\"values\"[47], actuals_array_double.\"values\"[48], actuals_array_double.\"values\"[49], actuals_array_double.\"values\"[50], actuals_array_double.\"values\"[51], actuals_array_double.\"values\"[52], actuals_array_double.\"values\"[53], actuals_array_double.\"values\"[54], actuals_array_double.\"values\"[55], actuals_array_double.\"values\"[56], actuals_array_double.\"values\"[57], actuals_array_double.\"values\"[58], actuals_array_double.\"values\"[59], actuals_array_double.\"values\"[60], actuals_array_double.\"values\"[61], actuals_array_double.\"values\"[62], actuals_array_double.\"values\"[63], actuals_array_double.\"values\"[64], actuals_array_double.\"values\"[65], actuals_array_double.\"values\"[66], actuals_array_double.\"values\"[67], actuals_array_double.\"values\"[68], actuals_array_double.\"values\"[69], actuals_array_double.\"values\"[70], actuals_array_double.\"values\"[71], actuals_array_double.\"values\"[72], actuals_array_double.\"values\"[73], actuals_array_double.\"values\"[74], actuals_array_double.\"values\"[75], actuals_array_double.\"values\"[76], actuals_array_double.\"values\"[77], actuals_array_double.\"values\"[78], actuals_array_double.\"values\"[79], actuals_array_double.\"values\"[80], actuals_array_double.\"values\"[81], actuals_array_double.\"values\"[82], actuals_array_double.\"values\"[83], actuals_array_double.\"values\"[84], actuals_array_double.\"values\"[85], actuals_array_double.\"values\"[86], actuals_array_double.\"values\"[87], actuals_array_double.\"values\"[88], actuals_array_double.\"values\"[89], actuals_array_double.\"values\"[90], actuals_array_double.\"values\"[91], actuals_array_double.\"values\"[92], actuals_array_double.\"values\"[93], actuals_array_double.\"values\"[94], actuals_array_double.\"values\"[95], actuals_array_double.\"values\"[96], actuals_array_double.\"values\"[97], actuals_array_double.\"values\"[98], actuals_array_double.\"values\"[99], actuals_array_double.\"values\"[100], actuals_array_double.asset_id, actuals_array_double.created\n", " Buffers: shared hit=3986038 read=1888230\n", "Planning Time: 6.560 ms\n", "JIT:\n", " Functions: 10\n", " Options: Inlining true, Optimization true, Expressions true, Deforming true\n", " Timing: Generation 7.493 ms, Inlining 9.084 ms, Optimization 494.335 ms, Emission 988.614 ms, Total 1499.526 ms\n", "Execution Time: 75249.330 ms`,\n", " query: `\n", " EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)\n", "\n", " SELECT\n", " day,\n", " COUNT(*) as count_asset,\n", " SUM(t00), SUM(t01), SUM(t02), SUM(t03), SUM(t04), SUM(t05), SUM(t06), SUM(t07), SUM(t08), SUM(t09), SUM(t10), SUM(t11), SUM(t12), SUM(t13), SUM(t14), SUM(t15), SUM(t16), SUM(t17), SUM(t18), SUM(t19), SUM(t20), SUM(t21), SUM(t22), SUM(t23), SUM(t24), SUM(t25), SUM(t26), SUM(t27), SUM(t28), SUM(t29), SUM(t30), SUM(t31), SUM(t32), SUM(t33), SUM(t34), SUM(t35), SUM(t36), SUM(t37), SUM(t38), SUM(t39), SUM(t40), SUM(t41), SUM(t42), SUM(t43), SUM(t44), SUM(t45), SUM(t46), SUM(t47), SUM(t48), SUM(t49), SUM(t50), SUM(t51), SUM(t52), SUM(t53), SUM(t54), SUM(t55), SUM(t56), SUM(t57), SUM(t58), SUM(t59), SUM(t60), SUM(t61), SUM(t62), SUM(t63), SUM(t64), SUM(t65), SUM(t66), SUM(t67), SUM(t68), SUM(t69), SUM(t70), SUM(t71), SUM(t72), SUM(t73), SUM(t74), SUM(t75), SUM(t76), SUM(t77), SUM(t78), SUM(t79), SUM(t80), SUM(t81), SUM(t82), SUM(t83), SUM(t84), SUM(t85), SUM(t86), SUM(t87), SUM(t88), SUM(t89), SUM(t90), SUM(t91), SUM(t92), SUM(t93), SUM(t94), SUM(t95), SUM(t96), SUM(t97), SUM(t98), SUM(t99)\n", "\n", " FROM (\n", " -- Select only latest rows\n", " SELECT DISTINCT ON (asset_id, day)\n", " day,\n", " values[1] as t00, values[2] as t01, values[3] as t02, values[4] as t03, values[5] as t04, values[6] as t05, values[7] as t06, values[8] as t07, values[9] as t08, values[10] as t09, values[11] as t10, values[12] as t11, values[13] as t12, values[14] as t13, values[15] as t14, values[16] as t15, values[17] as t16, values[18] as t17, values[19] as t18, values[20] as t19, values[21] as t20, values[22] as t21, values[23] as t22, values[24] as t23, values[25] as t24, values[26] as t25, values[27] as t26, values[28] as t27, values[29] as t28, values[30] as t29, values[31] as t30, values[32] as t31, values[33] as t32, values[34] as t33, values[35] as t34, values[36] as t35, values[37] as t36, values[38] as t37, values[39] as t38, values[40] as t39, values[41] as t40, values[42] as t41, values[43] as t42, values[44] as t43, values[45] as t44, values[46] as t45, values[47] as t46, values[48] as t47, values[49] as t48, values[50] as t49, values[51] as t50, values[52] as t51, values[53] as t52, values[54] as t53, values[55] as t54, values[56] as t55, values[57] as t56, values[58] as t57, values[59] as t58, values[60] as t59, values[61] as t60, values[62] as t61, values[63] as t62, values[64] as t63, values[65] as t64, values[66] as t65, values[67] as t66, values[68] as t67, values[69] as t68, values[70] as t69, values[71] as t70, values[72] as t71, values[73] as t72, values[74] as t73, values[75] as t74, values[76] as t75, values[77] as t76, values[78] as t77, values[79] as t78, values[80] as t79, values[81] as t80, values[82] as t81, values[83] as t82, values[84] as t83, values[85] as t84, values[86] as t85, values[87] as t86, values[88] as t87, values[89] as t88, values[90] as t89, values[91] as t90, values[92] as t91, values[93] as t92, values[94] as t93, values[95] as t94, values[96] as t95, values[97] as t96, values[98] as t97, values[99] as t98, values[100] as t99\n", " FROM actuals_array_double\n", " ORDER BY asset_id, day, created DESC NULLS LAST\n", " ) AS t\n", "\n", " GROUP BY day;\n", " `,\n", " }\n", " },\n", " template: \"\"\n", " });\n", " " ], "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pev2(*query_array_subscript('actuals_array_double'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Interesting: very little difference, while we read much more data (14 vs 8GB). Apparently reading the actual data is not the problem, the bottleneck seems to be looking through the index." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing the query result\n", "\n", "We've been doing the same query over and over, so now let's see what the data actually looks like. (Just the last 2 days, to avoid a huge plot.)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", " \n", " \n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query, result = query_array_unnest(explain=False)\n", "\n", "dt = []\n", "val = []\n", "\n", "for date, n_assets, values in result[:2]:\n", " for i, v in enumerate(values):\n", " dt.append(datetime.combine(date, time()) + timedelta(minutes=i*15))\n", " val.append(v)\n", "\n", "fig = px.line(x=dt, y=val, title=\"Last two days of query result\")\n", "# Need to include it like this because plotly does not include its javascript by default any more\n", "display(HTML(fig.to_html(\"plot.html\", include_plotlyjs='cdn')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scores\n", "So to conclude, here are the scores (lowest seen over a few runs):\n", "\n", "| Query | Approx. duration |\n", "|-------------------------|------------------------|\n", "| 1a) UNNEST w/o index | 4.5min |\n", "| 1b) UNNEST with index | 3.5min |\n", "| 2a) Subscript | 1min |\n", "| 2b) Double-precision | 1min |\n", "| 3) Pre-made columns | 20s |\n", "\n", "Lessons learned:\n", "- Subqueries are ok but [CTEs](https://www.postgresql.org/docs/current/queries-with.html) are materialized and thus can be a barrier to optimizations like predicate push down (edited the first query after learning this)\n", "- UNNEST() used on large arrays creates lots of rows which might slow down subsequent aggregation steps\n", "- Creating columns by subscripting an array directly is faster for aggregations with large arrays (and reasonably constant array size)\n", "- Some postprocessing is needed with all queries except query 1 (which makes no assumptions about the number of columns)\n", "- Columns win over arrays for aggregations\n", "\n", "## Conclusion\n", "It was quite cool to dive so deep into Postgres datastructures and indexes, and benchmarking various methods with representative data. Further optimization steps could be to materialize the result (although the GROUP BY is then not customizable any more) or cache it somewhere else. Also [TimescaleDB](https://docs.timescale.com/) looks interesting, although it is very realtime-timeseries optimized, meaning that it assumes timeseries are always append-only while this usecase might have random INSERTs for data in the past. Other NoSQL databases might also be interesting to try, although they will require much more thought on row keys, sorting and the type of queries that need to be performed." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "nikola": { "date": "2020-09-05 12:00:00 UTC", "slug": "postgres-timeseries", "title": "Benchmarking Postgres timeseries queries" } }, "nbformat": 4, "nbformat_minor": 4 }