Home > References

Performance Guidelines


Performance Basic Check-List

While EBX.Platform is designed for supporting large volumes of data, several common factors can lead to poor performance. Addressing the key points discussed in this section will solve usual performance bottlenecks.

Insufficient memory

EBX.Platform memory cache ensures a much more efficient access to data when this data is already loaded in cache. If there is not enough space for working data, swaps between the Java heap space and the underlying database can heavily degrade overall performance.

These aspects are exposed in the section Memory Management below.

Expensive programmatic extensions

The table below details, for each extensible use case, which programmatic extensions can be implemented.

Use case

Programmatic extensions that can be involved

Validation

Table access

EBX.Manager content display

Data update

For large volume of data, cumbersome algorithms will have serious effects on performance. For example, a constraint algorithm's complexity is O(n 2 ) ; if size is 100, the resulting cost is about 10,000 (this generally produces an immediate result); but if size is 10,000, the resulting cost will be in the order of 10,000,000.

Another source of slowness is the call to external resources. Local caching usually solves this type of problem.

If one of the specific use cases above usually shows poor performance, it is advised to track the problem either through code analysis or by means of a Java profiling tool.

Directory integration

Authentication and permissions management involves the directory .

If a specific directory implementation is deployed and accesses an external directory, it can be useful to ensure local caching. More particularly, one of the most frequently called methods is Directory.isUserInRole .

Aggregated lists

In a schema, when an element's cardinality constraint maxOccurs is greater than 1 and no osd:table is declared on this element, it is implemented as a Java List. We call this an aggregated lists as opposed to tables

It is important to consider that no particular optimizations are done to access aggregated lists (for example: iterations, GUI display, etc.). Additionally and outside performance concerns, the aggregated lists are limited regarding many functionalities that are supported by tables (see tables introduction for a list of those features).

Hence aggregated lists should be used only for small volumes of simple data (one or two dozen of occurrences), with no advanced requirements for their identification, lookups, permissions, etc. For larger volumes of data (or more advanced functionalities), it is recommended to use osd:table declarations.

Memory Management

Loading strategy

The administrator can specify the loading strategy of a branch or version in its information pane. The default strategy is to load and unload the resources on demand. For homes that are heavily used, a "forced load" strategy is usually recommended.

"Load and unload on demand" mode

In this default mode, each resource in a home is loaded or built only when it is needed. Moreover the resources of the home are "softly" referenced by means of the standard Java SoftReference class; this implies that each resource can be unloaded "at the discretion of the garbage collector in response to memory demand".

So the advantage of the default mode is the ability to free the memory when needed. As a counterpart, this implies a load/built cost when an accessed resource has not yet been loaded since server startup, or if it has been unloaded since.

"Forced load" mode

If the "Forced load" is enabled for a home, the load of its resources is asynchronously performed at server startup. Moreover, each resource of the home is maintained into memory until server is shut down or home is closed.

This mode is particularly recommended for long-lived homes and/or those that are heavily used, namely any home that serves as a reference.

Monitoring

Indications of EBX.Platform load activity are provided by the underlying database monitoring, and also by the 'monitoring' log category .

If numbers of cleared and builds are high for a long time, this is an indication that EBX.Platform is swapping.

Tuning the memory

The maximum size of memory allocation pool is usually specified by Java command-line option -Xmx . As is the case for any intensive process, it is important that the size specified by this option does not go beyond the available physical RAM, so that the Java process does not swap to disk at operating-system level.

The tuning of the garbage collector can also benefit to the overall performance. This tuning shall be adapted to the use cases and is specific to the Java Runtime Environment

Validation

The internal incremental validation framework will optimize the work needed when some updates occur. The incremental validation process runs as follows:

However, there is an incompressible part that is systematically revalidated (even if no updates have occurred since last validation): these are nodes with unknown dependencies . A node has unknown dependencies if:

Consequently, on large tables (beyond 10 5 order), it is recommended to avoid nodes with unknown dependencies (or at least to minimize the number of such nodes). For constraints, the developer is able to specify two alternative modes that drastically reduce incremental validation cost: local dependency mode and explicit dependencies . For more information, see Dependencies and Validation section.

Note: it is possible for a user granted with an Administrator role to manually reset the validation report of an adaptation. This option is available from the validation report section in EBX.Manager.

Massive Updates

Massive updates can involve several hundreds of thousands of insertions, modifications or deletions. Those updates are normally not frequent (usually initial data imports) or they are performed in a non-interactive way (usually nightly batches), hence performance is less critical than for frequent and interactive operations. However, like classic batch processing, it has some specific issues.

Transaction boundaries

It is generally not advised to use a single transaction when the number of atomic updates in the transaction is beyond the order of 10 4

The main reason is that large transactions require a lot of resources (more particularly memory) on EBX.Platform and on underlying database.

For reducing transactions' size, it is possible to:

On the other hand, specifying a very small transaction size will be also counter performant because of the specific persistent tasks done for each commit.

Note. If intermediate commits are a problem because transactional atomicity is no longer guaranteed, it is recommended to execute the massive update inside a dedicated branch. This branch will be created just before the massive update. If update does not complete successfully, the branch must just be closed; if it succeeds, the branch can be merged safely.

Triggers

If needed, triggers can be deactivated by means of method setTriggerActivation .

Access to tables

Functionalities

Tables are commonly accessed through EBX.Manager and also through Request API and data services. This access involves a unique set of functions, including a dynamic resolution process, that we detail for a better understanding of their performance implications.

Architecture and design

In order to improve the speed of operations in tables, indexes are managed by the EBX.Platform engine.

EBX.Platform advanced features such as advanced life-cycle (versions and branches), instances inheritance and flexible XML Schema modeling, have led to a particular design on indexing mechanisms. This design can be summarized as follows:

Performance considerations

The impacts on performance are the following, if indexes are already built:

  1. The access to a table without specific filter and sort is almost immediate.

  2. If the user has applied a specific filter or if the table access depends on a programmatic rule, the access to a table should be quick. More precisely, it depends on the cost of the specific filtering algorithm that is executed when fetching at least 2,000 occurrences.

  3. Both cases above guarantee an access time that is independent from the size of the table, and provides a view sorted on primary key ascendant order. If the table is specifically sorted, then the first access time depends on the table size according to a N log(N) function (where N is the number of records in the resolved view).

If indexes are not yet built, additional time will be needed:

  1. If level 1 is ready but level 2 index is not yet loaded or has been unloaded, the build time is BL2 = o(N log(N))

  2. If level 1 indexes are already available in persistent cache but are not yet loaded, the time to load them is LL1 = o(N)

  3. If level 1 indexes have not yet been written in persistent cache, the time to load them is BL1 = o(N) . This is the worst case because it implies to load all blocs.

Other operations on tables

The creation of new occurrences (or records insert ) depends on level 1 index. Hence, a creation becomes almost immediate if level 1 index is loaded.

Conclusion about tables

Faster access to tables is ensured if indexes are ready and maintained in memory cache. As mentioned above, it is important that the Java Virtual Machine has enough space allocated, so that it does not release indexes too quickly.

Home > References