We put the latest Apache Syncope and PostgreSQL on the bench to try and push to the limit: here are the results
Lately, some activity on the Apache Syncope project arose around the effort to improve the overall performance, with specific target of managing millions of identities.
While building a brand new persistence layer based on no-sql repositories is certainly possible - a non-trivial effort, anyway - immediate action was taken to enable the existing JPA-based persistence layer with the latest features provided by the supported DBMSes: PostgreSQL was the first one.
Coming from a long story in the Open Source world, PostgreSQL has become quite a hot topic in the recent years, and for several good reasons: compliance with SQL standards, strong performance, PL/PgSQL stored procedure calls, extensibility, replication, active community, enterprise support available by several trusted providers all around the world.
Among the features provided by PostgreSQL, the JSONB data type was identified as base for performance improvement for Apache Syncope.
The test code was developed as an Apache Maven project, and made available on GitHub. The project is set to accept several parameters to adjust its behavior, in order to be fully replicable.
An Apache JMeter suite is run, which performs concurrent operations against the Apache Syncope deployment: user create, read, search update, delete.
The suite was configured for 10 concurrent threads, each running all the operations for 30 loops, with ramp-up time of 10 seconds, for a total duration of 20 minutes.
Please consider that no tuning nor optimization was performed: the Docker images for PostgreSQL and Elasticsearch are vanilla from DockerHub.
The test suite was run under different conditions, and the results are now public at https://tirasa.github.io/syncoperf/:
PostgreSQL 11.1, standard JPA persistence
PostgreSQL 11.1, JPA persistence with JSONB support
Thanks to PostgreSQL and JSONB, Apache Syncope is able to gracefully manage millions of identities, with reasonable response times and throughput.
For smaller sets, standard persistence works quite well, even slightly better than JSONB; for larger sets there is simply no comparison: JSONB is the clear winner.
Given the high response times with standard JPA persistence for 100,000 identities, there were no enough reasons to check 1,000,000.
There is absolutely no reason not to empower Elasticsearch when the number of identities goes above 10,000 - see the difference between "10k nodyn" and "100k nodyn es" in both figures above, especially since the number of identities in the latter case is ten times the former.
Dynamic memberships are extremely resource-demanding and should be used with great care.