• Skip to primary navigation
  • Skip to main content
  • Skip to footer
ecm experts in australia

Seed Information Management

Enterprise Content Management Consultants ECM Experts

  • Services
    • Content Management
    • Collaboration
    • Business Process Management
    • Migration Services
    • Integration Services
    • Support & Maintenance
    • Pricing
  • Technology
    • Alfresco Digital Business Platform
    • Alfresco Application Development Framework
    • Cloud (PaaS)
    • Activiti
    • Flowable
    • Hyland RPA
  • Alfresco Addons
  • Case Studies
    • Industries
  • Blog
  • Contact Us
    • About

Tune Alfresco Solr for fast Indexing

May 10, 2024 By Seed IM

This blog discusses an option to improve Solr indexing performance.  To improve the speed at which indexing occurs by solr it is important that Solr can quickly retrieve a list of documents that require indexing.  When a document or folder is added or updated in Alfresco a transaction is written to the DB.  Using a rest request Solr asynchronously requests the latest set of transactions that have been created to determine what documents need to be indexed.  It then requests the documents for each transaction and adds their metadata and content to the index.   A key driver of the speed for which this occurs is that solr can get the list of transactions in a timely manner.  This is especially important if you are re-indexing your Repository shards.  We ran into a problem for one of our customers where the request to get the latest transactions was taking > 2 minutes and therefore timing out.  The advice we got from Alfresco was to increase the timeout time on the DB.  While that prevented timeouts it did nothing to improve the speed at which the index was building which resulted in the index lagging under even normal load.  We also found that the DB was using > 50% cpu as it tried to fullfill queries to the transaction tables that were taking forever.  This occurred in a shared solr instance with >100,000,000 transactions in the Repo.

How Indexing Works

Solr makes a number of requests to Alfresco to get new transactions. A transaction is created whenever a node or its permissions have changed (CUD). It is important that these requests return quickly or else indexing can slow as solr waits for the list of transactions to be returned.

The below request is made by Solr to Alfresco to get new transactions within DB_ID_Range 140-190 million transactions.

time curl -H "X-Alfresco-Search-Secret: <secret password>" -u {username:password} -X GET "http://<domain>:<port>/alfresco/service/api/solr/transactionInterval?fromNodeId=140000000&toNodeId=190000000"

In a sharded solution each shard is given a DB_ID_Range, the range being used to query the transaction table for changes. The solution below adds an index to the transaction table to match the db_id_range of transactions that a shard is configured to query. This massively decreases the response time for the query.

The rest request above results in Alfresco calling the following query (embedded in a larger query) to retrieve new transactions based on the shard db_id_range values.

select min(transaction_id)
 from alf_node
     
where id >= <db id range start value>
and id <= <db id range end value>;

 Looking at the query plan showed that it is using a btree index and that all of the slowness is in execution not the parsing of the query.  The index itself looked healthy but was slow to respond when above 70000000.

Alfresco by default includes an index on the transaction table called idx_alf_node_txn. However as this grows it becomes less efficient. Adding specific indexes to match the db_id_range ensures each shard query will use its specific index and therefore improve query performance.

Solution

Create a separate index for each shard with the index including the the same range as configured for the db_id_range of the shard (shard2 and shard3 in example below).

Eamples  below create DB indexes for 2 shards, shard1 having a db_id_range of 70,000,000 to 140,000,000 and shard2 with a range of 140,000,000 to 190,000,000

CREATE INDEX idx_alf_node_txn_shard2 ON public.alf_node USING btree (transaction_id)
 where id >= 70000000
 and id <= 140000000;

CREATE INDEX idx_alf_node_txn_shard3 ON public.alf_node USING btree (transaction_id)
 where id >= 140000000
and id <= 190000000;

By creating a separate index for shards 1 and 2 these will be used when those shards query the transaction table for more transactions within their specific db_id_range.

This means that the query  execution is now using the separate indexes for each shard when executing the query and returns very fast (15ms) compared to 160,000 ms before we tuned it.  One interesting thing to note here is that the db indexes have to be exactly the same range as the shard db_id_range or else it is not used when the shard requests changes from the index.

Testing

Run the query below before adding the index and note the time:

EXPLAIN ANALYZE select min(transaction_id)from alf_node where id >= db id range start value and id <= db id range end value;

Once new index has been created then rerun the query to verify that the query returns quickly.

Also,you can run the curl request made by Solr to Alfresco to verify the time it taked to get back the results.

Verify that solr index is not lagging.

Conclusion

For the customer we implemented this for it had a very positive impact.  Not only did the index stop lagging, the DB also used considerably less cpu and therefore could be downsized saving considerable cost.  I hope this helps anyone who is tuning Alfresco solr to avoid many hours of frustration when managing Alfresco large repositories.

Footer


Seed IM is a leading ECM consulting company providing powerful solutions to businesses and organisations of all sizes

Contact Us

  • Seed Information Management Pty Ltd
    90 Maribyrnong Street
    Footscray VIC 3011
  • 03 9021 0837
  • info@seedim.com.au

Articles

Semantic Content Management for Alfresco
Using Multiple Taxonomies To Browse Your Content
Records Management Using Alfresco One

Copyright © 2025 Seed Information Management Pty Ltd · Contact Us · Privacy Policy · Our Digital Media Agency is Marmoset Digital