Home > References

Performance Guidelines

Usual causes of poor performance

While EBX.Platform is designed for supporting large volumes of data, several common factors can lead to poor performance.

Insufficient memory

EBX.Platform memory cache ensures a much more performant access to data when this data is already loaded in cache. If there is not enough space for working data, swaps between Java heap space and underlying database will slow sharply overall performance.

The maximum size of memory allocation pool is usually specified by Java command-line option -Xmx. As all intensive processes, it is important that the size specified by this option does not go beyond the available physical RAM, so that Java process does not swap to disk at operating-system level.

Indications of EBX.Platform swap are provided by underlying database monitoring, and also by the following lines in kernel category of EBX.Platform log:

Cache #16954e1 monitoring:
  Since 2008-04-08T15:06:35.281 (10 sec)
  1012, 31, 8: max memory, total memory, free memory approx. (Mb)
  blocs:    26, 0, 26, 5969 (first load, reload, total, time spent (ms)).
  idxMaps:  0, 0, 0, 0 (first built, rebuilt, total, time spent (ms).
  idxTrees: 0, 0, 0, 0 (first built, rebuilt, total, time spent (ms)).

If numbers of reloads or rebuilts are high, this is an indication that EBX.Platform is swapping.

Expensive programmatic extensions

The table below details, for each extensible use case, which programmatic extensions can be implemented.

Use case Programmatic extensions that can be involved
Validation
Table access
EBX.Manager content display
Data update

For large volume of data, cumbersome algorithms will have serious effects on performance. For example, a constraint algorithm's complexity is O(n2); if size is 100, the resulting cost is about 10,000 (this produces generally an immediate result); but if size is 10,000, the resulting cost will be in the order of 10,000,000.

Another source of slowness is the call to external resources. Local caching usually solves this type of problem.

If one of the specific use cases above usually shows poor performance, it is advised to track the problem either through code analysis or by means of a Java profiling tool.

Cache preparation

If application server is restarted, it is advised to preload the cache so that a minimal level of preparation is ensured. The property ebx.repository.preload performs in background the preload and validation of Reference branch so that it is immediately available.

Directory integration

Authentication and permissions management involves the directory.

If a specific directory implementation is deployed and calls an external directory, it can be useful to ensure a local caching. More particularly, one of the most frequently called method is Directory.isUserInRole.

Validation

As written above, it is advised to let the property ebx.repository.preload activated, so that Reference branch is prevalidated at server startup.

Internal incremental validation framework will optimize the work needed when some updates occur. However there is an incompressible part that is systematically revalidated (even if no updates have occurred since last validation): these are nodes with unknown dependencies. A node has unknown dependencies if:

Consequently, on very large tables (beyond 105 order), it is recommended to avoid nodes with unknown dependencies (or at least to minimize the number of such nodes). For constraints, the developer is able to specify two alternative modes that reduce drastically incremental validation cost: local dependency mode and explicit dependencies. For more information, see Dependencies and Validation section.

Massive Updates

Massive updates can involve several hundreds of thousands insertions, modifications or deletions. Those updates are normally not frequent (usually initial data imports) or they are performed in a non-interactive way (usually nightly batches), hence performance is less critical than frequent and interactive operations. However, like classic batch processing, it has some specific issues.

Transaction boundaries

It is generally not advised to use a single transaction when number of atomic updates in the transaction is beyond the order of 104.

The main reason is that large transactions require a lot of resources (more particularly memory) on EBX.Platform and on underlying database.

For reducing transactions' size, it is possible to:

On the other hand, specifying a very small transaction size will be also counter performant because of the specific persistent tasks done for each commit.

Note. If intermediate commits are a problem because transactional atomicity is no more guaranteed, it is recommended to execute the massive update inside a dedicated branch. This branch will be created just before the massive update. If update does not complete successfully, branch must just be closed; if it succeeds, branch can be merged safely.

Triggers

If needed, triggers can be deactivated by means of method setTriggerActivation.

Access to tables

Functionalities

Tables are commonly accessed through EBX.Manager and also through Request API and data services. This access involves a unique set of functions, including a dynamic resolution process, that we detail for a better understanding of their performance implications.

Inheritance:
Inheritance in adaptations tree implies to take into account records and values that are defined by parent instance, through a recursive process. Also in a root instance, a record can inherit some of its values from schema default values (xs:default attribute).
Value computation:
A node declared as osd:function is always computed on the fly, when the value is accessed.
Filtering:
A XPath predicate, a programmatic filter, or a record-level permission rule imply a selection on records.
Sort:
Last but not least, a sort on the result can be made.

Architecture and design

In order to improve the speed of operations in tables, indexes are managed by EBX.Platform engine.

EBX.Platform advanced features such as advanced life-cycle (versions and branches), instances inheritance and flexible XML Schema modeling, have led to a particular design on index mechanisms. This design can be summarized as follows:

Performance considerations

The impacts on performance are the following, if indexes are already built:

  1. The access to a table without specific filter and sort is almost immediate.
  2. If the user has applied a specific filter or if the table access depends on a programmatic rule, the access to a table should be quick. More precisely, it depends on the cost of the specific filtering algorithm that is executed on a fetch of at least 2,000 occurrences.
  3. Both cases above guarantee an access time that is independent of the size of the table and provide a view sorted on primary key ascendant order. If the table is specifically sorted, then the first access time depends on the table size according to a N log(N) function (where N is the number of records in the resolved view).

If indexes are not yet built, additional time will be needed:

  1. If level 1 is ready but level 2 index is not yet loaded or has been unloaded, the built time is BL2 = o(N log(N)).
  2. If level 1 indexes are already available in persistent cache but are not yet loaded, the time to load them is LL1 = o(N).
  3. If level 1 indexes have not yet been written in persistent cache, the time to load them is BL1 = o(N). This is the worst case because it implies to load all blocs.

Other operations on tables

The creation of new occurrences (or records insert) depends on level 1 index. Hence, a creation becomes almost immediate if level 1 index is loaded.

Conclusion about tables

Faster access to tables is ensured if indexes are ready and maintained in memory cache. As mentioned above, it is important that Java Virtual Machine has enough space allocated, so that it does not release too quickly indexes

 

 

Home > References