Can at least some parts of pgRouting run in parallel with parallel query?

Question

Objectives

I would like to run pgr_dijkstra for each pair of vertices in a routable graph. I have a simple script that takes each pair of vertices and launches a pgr_dijkstra against the database.

It would be nice to use PostgreSQL parallel query capability to speed it up.

I would like to achieve an "easy" speed up without a need to implement advanced code like contractions.

Problem

Although I can see that for some of my tables explain analyze shows a plan that involves Gather and Parallel Seq Scan, trying to run pgr_dijkstra with "parallel" parameters enabled and with the PARALLEL SAFE keywords actually makes it slower.

Input data

I have a PostGIS-enabled database with some OSM data imported with osm2pgsql:

Data	Size
Nodes (vertices) table	730 vertices
Nodes (vertices) table	160 kB
Edges table	1100 vertices
Edges table	656 kB

PostgreSQL and system configuration

CPU: 11th Gen Intel Core i7-1165G7

PostgreSQL parameter	Value
`min_parallel_table_scan_size`	8kB
`force_parallel_mode`	off
`parallel_setup_cost`	10
`parallel_tuple_cost`	0.001
`max_parallel_workers_per_gather`	8
`max_parallel_workers`	8
`max_worker_processes`	8

I have set force_parallel_mode to off as per this recommendation, but even when on it does not make any difference.

Changing parallel_tuple_cost from the default value to 0.001 does not seem to help.

Queries and results

I have set up a very simple pgr_dijkstra call with and without SAFE, with origin/destination vertices' ids hardcoded:

CREATE OR REPLACE 
FUNCTION pgr_parallel() 
RETURNS VOID 
AS $$
    BEGIN
    PERFORM pgr_dijkstra(
    '
      SELECT gid AS id,
        source,
        target,
        length_m AS cost
      FROM edges_table 
      WHERE
                        NOT source IS NULL
                        AND
                        NOT target IS NULL
                        AND
                        NOT the_geom IS NULL
                        AND NOT length_m IS NULL
    ',
    ARRAY[161]
    ,
    ARRAY[639]
   ,
    directed := false
    );
    END;
$$ LANGUAGE 'plpgsql'
PARALLEL SAFE;

I have a simple script that calls this function N times, and it does not seem that it runs in a parallel way:

Type of function	Number (N) of pairs of vertices	Execution time
undefined (not parallel)	50,000	67 seconds
`PARALLEL SAFE`	50,000	89 seconds

As an additional observation, I run htop and see that in the beginning, when the (parallel) script runs, all 8 physical cores are being used. But when the script finishes firing "tasks" to the database, htop shows that only one core gets used by the postgres process.

Questions

If I understand correctly, a query can be 'parallelized' if there is some significant independent work to be done by each worker. In case of pgr_dijkstra and any other shortest path-finding algorithm, isn't this the case? I would suppose that pgr_dijkstra fetches vertices and edges and then runs an independent computation on them.

Is it possible to use the parallel query capability of PostgreSQL to make at least some parts of pgr_dijkstra to run in parallel? If yes, how?

Note: In this post by Crunchy Data and Paul Ramsey, PARALLEL SAFE is used for a function that contains pgr_dijkstra, but there is no detail if it runs faster thanks to that.

score 1 · Accepted Answer · answered May 16 '23 at 20:39

1

Running \df+ pgr_dijkstra;, we see that the function is parallel unsafe, so no, you won't be able to parallelize it within a single query.

However you could try to run multiple queries in parallel (i.e. between different pairs of vertices), using multiple connections. Bear in mind that they will be competing for the same resources though.

answered May 16 '23 at 20:39

JGH

15,928
4
31
48

Thanks for pointing me to function properties of `pgr_dijkstra`, it is indeed `parallel unsafe`. I wonder why then Crunchy Data [wraps](https://www.crunchydata.com/blog/routing-with-postgresql-and-crunchy-spatial) `pgr_dijkstra` in a higher level function and marks it as `parallel safe`. >> However you could try to run multiple queries in parallel (i.e. between different pairs of vertices), using multiple connections This is exactly what I do. In the beginning all CPU threads are busy (probably the script polling the database via multiple connections), but then only 1 thread is used. – theTruthIsOutThere May 17 '23 at 08:08