Home > References
Performance Guidelines
Performance Basic Check-List
While EBX.Platform is designed for supporting large volumes of data, several common factors can lead to poor performance. Addressing the key points discussed in this section will solve usual performance bottlenecks.
Insufficient memory
EBX.Platform memory cache ensures a much more efficient access to data when this data is already loaded in cache. If there is not enough space for working data, swaps between the Java heap space and the underlying database can heavily degrade overall performance.
These aspects are exposed in the section Memory Management below.
Expensive programmatic extensions
The table below details, for each extensible use case, which programmatic extensions can be implemented.
|
Use case |
Programmatic extensions that can be involved |
|
Validation |
|
|
Table access |
|
|
EBX.Manager content display |
|
|
Data update |
|
For large volume of data, cumbersome algorithms will have serious effects on performance. For example, a constraint algorithm's complexity is O(n 2 ) ; if size is 100, the resulting cost is about 10,000 (this generally produces an immediate result); but if size is 10,000, the resulting cost will be in the order of 10,000,000.
Another source of slowness is the call to external resources. Local caching usually solves this type of problem.
If one of the specific use cases above usually shows poor performance, it is advised to track the problem either through code analysis or by means of a Java profiling tool.
Directory integration
Authentication and permissions management involves the directory .
If a specific directory implementation is deployed and accesses an external directory, it can be useful to ensure local caching. More particularly, one of the most frequently called methods is Directory.isUserInRole .
Aggregated lists
In a schema, when an element's cardinality constraint
maxOccurs is greater than 1 and no
osd:table is declared on this element, it is implemented as a Java List. We call this an
aggregated lists as opposed to
tables .
It is important to consider that no particular optimizations are done to access aggregated lists (for example: iterations, GUI display, etc.). Additionally and outside performance concerns, the aggregated lists are limited regarding many functionalities that are supported by tables (see tables introduction for a list of those features).
Hence
aggregated lists should be used only for small volumes of simple data (one or two dozen of occurrences), with no advanced requirements for their identification, lookups, permissions, etc. For larger volumes of data (or more advanced functionalities), it is recommended to use
osd:table declarations.
Memory Management
Loading strategy
The administrator can specify the loading strategy of a branch or version in its information pane. The default strategy is to load and unload the resources on demand. For homes that are heavily used, a "forced load" strategy is usually recommended.
"Load and unload on demand" mode
In this default mode, each resource in a home is loaded or built only when it is needed. Moreover the resources of the home are "softly" referenced by means of the standard Java
SoftReference class; this implies that each resource can be unloaded "at the discretion of the garbage collector in response to memory demand".
So the advantage of the default mode is the ability to free the memory when needed. As a counterpart, this implies a load/built cost when an accessed resource has not yet been loaded since server startup, or if it has been unloaded since.
"Forced load" mode
If the "Forced load" is enabled for a home, the load of its resources is asynchronously performed at server startup. Moreover, each resource of the home is maintained into memory until server is shut down or home is closed.
This mode is particularly recommended for long-lived homes and/or those that are heavily used, namely any home that serves as a reference.
Monitoring
Indications of EBX.Platform load activity are provided by the underlying database monitoring, and also by the 'monitoring' log category .
If numbers of cleared and builds are high for a long time, this is an indication that EBX.Platform is swapping.
Tuning the memory
The maximum size of memory allocation pool is usually specified by Java command-line option
-Xmx . As is the case for any intensive process, it is important that the size specified by this option does not go beyond the available physical RAM, so that the Java process does not swap to disk at operating-system level.
The tuning of the garbage collector can also benefit to the overall performance. This tuning shall be adapted to the use cases and is specific to the Java Runtime Environment
Validation
The internal incremental validation framework will optimize the work needed when some updates occur. The incremental validation process runs as follows:
-
A first call to a validation report performs the full validation of an adaptation instance. Note that the loading strategy can also specify a branch to be prevalidated at server startup.
-
Then, data updates will transparently and asynchronously maintain the validation report, in so far as the updated nodes specify explicit dependencies. Note: standard and static facets, foreign key constraints, dynamics facets, selection nodes specify explicit dependencies.
-
If a mass-update is executed or if there are too many validation messages, the incremental validation process is stopped. Next call to the validation report will hence trigger a full validation.
-
Also, if a transaction is canceled, the validation state of the updated adaptation instances is reset. Next call to the validation report will trigger a full validation as well.
However, there is an incompressible part that is systematically revalidated (even if no updates have occurred since last validation): these are nodes with unknown dependencies . A node has unknown dependencies if:
-
it possesses a programmatic constraint in default unknown dependencies mode;
-
it declares a computed value ;
-
it declares a dynamic facet that depends on a node that is itself a computed value ;
Consequently, on large tables (beyond 10 5 order), it is recommended to avoid nodes with unknown dependencies (or at least to minimize the number of such nodes). For constraints, the developer is able to specify two alternative modes that drastically reduce incremental validation cost: local dependency mode and explicit dependencies . For more information, see Dependencies and Validation section.
Note: it is possible for a user granted with an Administrator role to manually reset the validation report of an adaptation. This option is available from the validation report section in EBX.Manager.
Massive Updates
Massive updates can involve several hundreds of thousands of insertions, modifications or deletions. Those updates are normally not frequent (usually initial data imports) or they are performed in a non-interactive way (usually nightly batches), hence performance is less critical than for frequent and interactive operations. However, like classic batch processing, it has some specific issues.
Transaction boundaries
It is generally not advised to use a single transaction when the number of atomic updates in the transaction is beyond the order of 10 4 .
The main reason is that large transactions require a lot of resources (more particularly memory) on EBX.Platform and on underlying database.
For reducing transactions' size, it is possible to:
-
specify property ebx.manager.import.commit.threshold , however, this property is used only for archive imports done interactively, in the context of EBX.Manager;
-
explicitly specify commit threshold inside the batch procedure;
-
structurally limit the transaction scope by implementing Procedure onto a part of the task and executing it as many times as needed.
On the other hand, specifying a very small transaction size will be also counter performant because of the specific persistent tasks done for each commit.
Note. If intermediate commits are a problem because transactional atomicity is no longer guaranteed, it is recommended to execute the massive update inside a dedicated branch. This branch will be created just before the massive update. If update does not complete successfully, the branch must just be closed; if it succeeds, the branch can be merged safely.
Triggers
If needed, triggers can be deactivated by means of method setTriggerActivation .
Access to tables
Functionalities
Tables are commonly accessed through EBX.Manager and also through Request API and data services. This access involves a unique set of functions, including a dynamic resolution process, that we detail for a better understanding of their performance implications.
-
Inheritance: Inheritance in adaptations tree implies to take into account records and values that are defined by the parent instance, through a recursive process. Also in a root instance, a record can inherit some of its values from schema default values (
xs:defaultattribute). -
Value computation: A node declared as
osd:functionis always computed on the fly, when the value is accessed. -
Filtering: A XPath predicate , a programmatic filter , or a record-level permission rule imply a selection on records.
-
Sort: Last but not least, a sort on the result can be made.
Architecture and design
In order to improve the speed of operations in tables, indexes are managed by the EBX.Platform engine.
EBX.Platform advanced features such as advanced life-cycle (versions and branches), instances inheritance and flexible XML Schema modeling, have led to a particular design on indexing mechanisms. This design can be summarized as follows:
-
Level 1 indexes maintain a data structure on raw table blocs. Raw table blocs are basic persistence units that can be shared by multiple versions and branches.
-
Level 2 indexes maintain a data structure on a full table as it is defined in a branch or in a version, however it does not take into account instances inheritance. Level 2 aggregates level 1 indexes.
-
Final access to tables, whose functionalities are described above, performs a dynamic resolution based on level 2 indexes.
Performance considerations
The impacts on performance are the following, if indexes are already built:
-
The access to a table without specific filter and sort is almost immediate.
-
If the user has applied a specific filter or if the table access depends on a programmatic rule, the access to a table should be quick. More precisely, it depends on the cost of the specific filtering algorithm that is executed when fetching at least 2,000 occurrences.
-
Both cases above guarantee an access time that is independent from the size of the table, and provides a view sorted on primary key ascendant order. If the table is specifically sorted, then the first access time depends on the table size according to a
N log(N)function (whereNis the number of records in the resolved view).
If indexes are not yet built, additional time will be needed:
-
If level 1 is ready but level 2 index is not yet loaded or has been unloaded, the build time is
BL2 = o(N log(N)). -
If level 1 indexes are already available in persistent cache but are not yet loaded, the time to load them is
LL1 = o(N). -
If level 1 indexes have not yet been written in persistent cache, the time to load them is
BL1 = o(N). This is the worst case because it implies to load all blocs.
Other operations on tables
The creation of new occurrences (or records insert ) depends on level 1 index. Hence, a creation becomes almost immediate if level 1 index is loaded.
Conclusion about tables
Faster access to tables is ensured if indexes are ready and maintained in memory cache. As mentioned above, it is important that the Java Virtual Machine has enough space allocated, so that it does not release indexes too quickly.
Home > References