Home > References
Performance Guidelines
Usual causes of poor performance
While EBX.Platform is designed for supporting large volumes of data, several common factors can lead to poor performance.
Insufficient memory
EBX.Platform memory cache ensures a much more performant access to data when this data is already loaded in cache. If there is not enough space for working data, swaps between Java heap space and underlying database will slow sharply overall performance.
The maximum size of memory allocation pool is usually specified
by Java command-line option -Xmx. As all intensive
processes, it is important that the size specified by this option does
not go beyond the available physical RAM, so that Java process does not
swap to disk at operating-system level.
Indications of EBX.Platform swap are provided by underlying
database monitoring, and also by the following lines in kernel
category of EBX.Platform log:
Cache #16954e1 monitoring: Since 2008-04-08T15:06:35.281 (10 sec) 1012, 31, 8: max memory, total memory, free memory approx. (Mb) blocs: 26, 0, 26, 5969 (first load, reload, total, time spent (ms)). idxMaps: 0, 0, 0, 0 (first built, rebuilt, total, time spent (ms). idxTrees: 0, 0, 0, 0 (first built, rebuilt, total, time spent (ms)).
If numbers of reloads or rebuilts are high, this is an indication that EBX.Platform is swapping.
Expensive programmatic extensions
The table below details, for each extensible use case, which programmatic extensions can be implemented.
| Use case | Programmatic extensions that can be involved |
| Validation | |
| Table access | |
| EBX.Manager content display | |
| Data update |
For large volume of data, cumbersome algorithms will have serious effects on performance. For example, a constraint algorithm's complexity is O(n2); if size is 100, the resulting cost is about 10,000 (this produces generally an immediate result); but if size is 10,000, the resulting cost will be in the order of 10,000,000.
Another source of slowness is the call to external resources. Local caching usually solves this type of problem.
If one of the specific use cases above usually shows poor performance, it is advised to track the problem either through code analysis or by means of a Java profiling tool.
Cache preparation
If application server is restarted, it is advised to preload the cache so that a minimal level of preparation is ensured. The property ebx.repository.preload performs in background the preload and validation of Reference branch so that it is immediately available.
Directory integration
Authentication and permissions management involves the directory.
If a specific directory implementation is deployed and calls an external directory, it can be useful to ensure a local caching. More particularly, one of the most frequently called method is Directory.isUserInRole.
Validation
As written above, it is advised to let the property ebx.repository.preload activated, so that Reference branch is prevalidated at server startup.
Internal incremental validation framework will optimize the work needed when some updates occur. However there is an incompressible part that is systematically revalidated (even if no updates have occurred since last validation): these are nodes with unknown dependencies. A node has unknown dependencies if:
- it possesses a programmatic constraint in default unknown dependencies mode;
- it declares a computed value;
- it declares a dynamic facet that depends on a node that is itself a computed value;
Consequently, on very large tables (beyond 105 order), it is recommended to avoid nodes with unknown dependencies (or at least to minimize the number of such nodes). For constraints, the developer is able to specify two alternative modes that reduce drastically incremental validation cost: local dependency mode and explicit dependencies. For more information, see Dependencies and Validation section.
Massive Updates
Massive updates can involve several hundreds of thousands insertions, modifications or deletions. Those updates are normally not frequent (usually initial data imports) or they are performed in a non-interactive way (usually nightly batches), hence performance is less critical than frequent and interactive operations. However, like classic batch processing, it has some specific issues.
Transaction boundaries
It is generally not advised to use a single transaction when number of atomic updates in the transaction is beyond the order of 104.
The main reason is that large transactions require a lot of resources (more particularly memory) on EBX.Platform and on underlying database.
For reducing transactions' size, it is possible to:
- specify property ebx.manager.import.commit.threshold, however this property is used only for archive imports done interactively, in the context of EBX.Manager;
- explicitly specify commit threshold inside the batch procedure;
- structurally limit the transaction scope by implementing Procedure onto a part of the task and executing it as many times as needed.
On the other hand, specifying a very small transaction size will be also counter performant because of the specific persistent tasks done for each commit.
Note. If intermediate commits are a problem because transactional atomicity is no more guaranteed, it is recommended to execute the massive update inside a dedicated branch. This branch will be created just before the massive update. If update does not complete successfully, branch must just be closed; if it succeeds, branch can be merged safely.
Triggers
If needed, triggers can be deactivated by means of method setTriggerActivation.
Access to tables
Functionalities
Tables are commonly accessed through EBX.Manager and also through Request API and data services. This access involves a unique set of functions, including a dynamic resolution process, that we detail for a better understanding of their performance implications.
- Inheritance:
- Inheritance in adaptations tree implies to take into account
records and values that are defined by parent instance, through a
recursive process. Also in a root instance, a record can inherit some
of its values from schema default values (
xs:defaultattribute). - Value computation:
- A node declared as
osd:functionis always computed on the fly, when the value is accessed. - Filtering:
- A XPath predicate, a programmatic filter, or a record-level permission rule imply a selection on records.
- Sort:
- Last but not least, a sort on the result can be made.
Architecture and design
In order to improve the speed of operations in tables, indexes are managed by EBX.Platform engine.
EBX.Platform advanced features such as advanced life-cycle (versions and branches), instances inheritance and flexible XML Schema modeling, have led to a particular design on index mechanisms. This design can be summarized as follows:
- Level 1 indexes maintain a data structure on raw table blocs. Raw table blocs are basic persistence units that can be shared by multiple versions and branches.
- Level 2 indexes maintain a data structure on a full table as it is defined in a branch or in a version, however it does not take into account instances inheritance. Level 2 aggregates level 1 indexes.
- Final access to tables, whose functionalities are described above, performs a dynamic resolution based on level 2 indexes.
Performance considerations
The impacts on performance are the following, if indexes are already built:
- The access to a table without specific filter and sort is almost immediate.
- If the user has applied a specific filter or if the table access depends on a programmatic rule, the access to a table should be quick. More precisely, it depends on the cost of the specific filtering algorithm that is executed on a fetch of at least 2,000 occurrences.
- Both cases above guarantee an access time that is independent
of the size of the table and provide a view sorted on primary key
ascendant order. If the table is specifically sorted, then the first
access time depends on the table size according to a
N log(N)function (whereNis the number of records in the resolved view).
If indexes are not yet built, additional time will be needed:
- If level 1 is ready but level 2 index is not yet loaded or has
been unloaded, the built time is
BL2 = o(N log(N)). - If level 1 indexes are already available in persistent cache
but are not yet loaded, the time to load them is
LL1 = o(N). - If level 1 indexes have not yet been written in persistent
cache, the time to load them is
BL1 = o(N). This is the worst case because it implies to load all blocs.
Other operations on tables
The creation of new occurrences (or records insert) depends on level 1 index. Hence, a creation becomes almost immediate if level 1 index is loaded.
Conclusion about tables
Faster access to tables is ensured if indexes are ready and maintained in memory cache. As mentioned above, it is important that Java Virtual Machine has enough space allocated, so that it does not release too quickly indexes
Home > References