Included in the Proceedings of the 16th Symposium and Summer School On Service-Oriented Computing.
PrePrint @ ResearchGate
SourceCode @ GitHub: https://github.com/fullben/java-persistence-benchmark
This documentation includes two parts. The first is an extension of additional plots and data which is not available in the paper due to space constraints. The second is a summary of our performance investigation for a single transaction and the implication of different service implementations and database queries on the execution time on the server.
All CSV files for data evaluation are included in the latest release of our WSS application:
To setup the database connection, copy .env-example to
.env and modify the environment variables. We are hosting
the data from our experiment on a machine called h13 in our own private
VM cluster. The connection infos follow:
For example (to use our pre-configured database that stores the raw data) you can use the following configuration:
MINION_MONGODB_HOST=h13.pi.uni-bamberg.de
MINION_MONGODB_PORT=27016
MINION_MONGODB_USER=guest
MINION_MONGODB_PASSWORD=guest
MINION_MONGODB_SSL=TRUEThen, run the following chunk the set the environmental variables:
load_dot_env(file = ".env")Afterwards, we can load all database-related functions into the project:
source("src/db.R")Connect to the database (database netdata, collection
netdata_metrics). Please make sure you are adjusting these
values according to your database setup.
# database
database = "netdata"
# collection
collection = "netdata"
# connect
netdata_coll <- db.connect(database, collection)Make sure that you disconnect after finishing your work:
db.disconnect(netdata_coll)During the execution of this R code, we will create temporary files and use the JMETER files generated during benchmarks. By default, this files will be stored in BASE_FOLDER.
BASE_FOLDER = "./raw/"Finally, we can load all necessary functions for the later data processing and plotting.
source("src/processing.R")
source("src/ggplot2.R")
In contrast to the Bookstore application, we also include some plots for all WSS implementations: JPA, MS-JACIS and MS-Sync




The two figures have the same content, once the Request-Response time in the first Figure and the processing time on the server. Interesting here is the overhead of middleware routing, transmission etc. which sums up to ~80ms on average for a single request for all transactions and implementations.
As already stated in the paper, there was an anomaly when executing the delivery transaction for our JPA solution, depicted on the left in green.
 After the
first version of the paper, we made different investigations.
After the
first version of the paper, we made different investigations.
public DeliveryResponse process(DeliveryRequest req) {
  // Find districts and carrier
  List<DistrictEntity> districts = districtRepository.findByWarehouseId(req.getWarehouseId());
  CarrierEntity carrier = carrierRepository.getById(req.getCarrierId());
  for (DistrictEntity district : districts) {
  OrderEntity order = orderRepository.findOldestUnfulfilledOrderOfDistrict(district.getId()).orElse(null);
   // …
   // Find all order items, set delivery date to now and sum amount
   for (OrderItemEntity orderItem : order.getItems()) {
    // …
   }
   // Save order and items
   order = orderRepository.save(order);
   // Update customer balance and delivery count
   CustomerEntity customer = order.getCustomer();
   // …
   customerRepository.save(customer);
  }
  return new DeliveryResponse(req);
 }All the *Respositories in the source code snippet are generated
SpringBoot Data JpaRepositories. What is noteworthy here
are the many database queries made by our method, especially within the
for loop. We call it the object oriented way of
programming here - we are still in Java and that is fine :)
Another important aspect is our custom query
findOldestUnfulfilledOrderOfDistrict. Since we need some
special SQL features we implemented a native query. The SQL
statement is as follows:
SELECT * 
FROM orders 
WHERE fulfilled = false AND district_id = :districtId 
ORDER BY entrydate ASC 
LIMIT 1It took a while for the first assumption that the ordering of the result set (which is increasing linearly over time - as the execution time of the query) and the number of database queries might be the cause for our performance issues.
Based on these assumptions, we changed the DeliveryService and the native SQL Query.
public DeliveryResponse process(DeliveryRequest req) {
  // Find districts and carrier
  List<DistrictEntity> districts = districtRepository.findByWarehouseId(req.getWarehouseId());
  CarrierEntity carrier = carrierRepository.getById(req.getCarrierId());
 
  List<String> districtIds = …;
  List<OrderEntity> orders = orderRepository.findOldestUnfulfilledOrderOfDistricts(districtIds);
  Map<String, CustomerEntity> customers = new HashMap<>();
 
  for (OrderEntity order : orders) {
 
   // … 
   // Update customer balance and delivery count
   CustomerEntity customer = order.getCustomer();
   //…
   customers.putIfAbsent(customer.getId(), customer);
  }
 
  // save - batch processing
  orderRepository.saveAll(orders);
  customerRepository.saveAll(customers.values());
  return new DeliveryResponse(req);
}We call this the batch oriented way . The idea here is to reduce the IO time when requesting data from the database.
We additionally adapted the SQL query
findOldestUnfulfilledOrderOfDistricts:
 SELECT * 
 FROM orders 
 WHERE(district_id, entrydate) IN 
  (
   SELECT district_id, MIN(entrydate) as entrydate 
   FROM orders 
   WHERE district_id IN (:districtIds) AND fulfilled = false 
   GROUP BY district_id
  )Only one query checking for the oldest order of each district by using subqueries and aggregate functions from the SQL standard.
The result was interesting:

We cannot definitely state that the ORDER BY SQL
statement caused this linear increase issue, but it is highly likely.
Databases are good in filtering (can be parallelized well) but bad in
sorting. Additionally in this case we reduced the number of
queries (we executed our benchmark with 10 districts)
from 32 to 5 :)
In a last step, we wanted to check which effect additional queries have on the application performance. So we combined the object-oriented and the batch-oriented way to some extent.
Situation for DeliveryService III
1 Query for getting all districts by a corresponding warehouse
1 Query for getting the carrier to deliver the order
    N Queries to get the oldest order for each district
1 Query to update all orders
1 Query to update all involved customers of the oldest ordersAnd the corresponding SQL query:
 SELECT * 
 FROM orders 
 WHERE(district_id, entrydate) IN 
  (
   SELECT district_id, MIN(entrydate) as entrydate 
   FROM orders 
   WHERE district_id = :districtId AND fulfilled = false 
   GROUP BY district_id
  )And the result:

What we see is that the additional 9 queries in our case lead to an average increase of 13.5 ms or to say it relatively 22%. So reducing the number of queries despite their higher complexity is worth the effort! :)