Oracle and async I/O... A world of a difference

What a difference enabling async I/O in Oracle makes…

While running a load test against an API product that I have to deal with in my other day to day job, I’ve noticed something in both the results and at the OS level (… on the DB server) that didn’t make much sense.

Odd results

The results of the performance test where somewhat OK but kinda unstable (… some strange variances in the response times). This graph tells the story better than 1000s of words.

Note the green line (transactions per second) going all over the place:

At first I suspected some sort of issue with the application server (Weblogic) and the DB Connection Pool. But all looked good there…

Then I’ve cast an eye on Oracle Enterprise Manager and noticed that most of the DB waits were related to I/O, although the storage of this particular test DB is located on a reasonably fast NVME SSD.

So I started looking at I/O stats on the Oracle Linux server hosting this DB. Being a lab DB, it’s more or less a standard Oracle install with not much performance tuning applied. Nor am I an Oracle expert that knows all secrets of the trade…

Anyways, there was one thing that somehow didn’t stack up: At the OS level, the % spent by the CPU in iowait was sporadically incredibly high (… 70%+ or so) with the CPU idle time plunging to less than 10%:

After reading various online articles about this, most of which suggested beefier HW or rewrite the app the app so that it would be more efficient with commits, it dawned on my that perhaps Oracle wasn’t using async I/O when writing to disk causing these high waitio stats.

I finally bumped into a few articles talking about async I/O settings in Oracle and found a few useful SQL queries…

This one will assist in figuring out whether async I/O is enabled on your Oracle DB files:

COL NAME FORMAT A50
SELECT NAME,ASYNCH_IO FROM V$DATAFILE F,V$IOSTAT_FILE I
WHERE  F.FILE#=I.FILE_NO
AND    FILETYPE_NAME='Data File';

… leading to a result like this. Note that for all files async IO is disabled…

So I decided to enable async I/O with these few SQL commands:

ALTER SYSTEM SET FILESYSTEMIO_OPTIONS=SETALL SCOPE=SPFILE;
SHUTDOWN IMMEDIATE;
STARTUP;

… and then checking again. As you can see, async I/O is enabled now:

NAME                                               ASYNCH_IO
-------------------------------------------------- ---------
/opt/oracle/oradata/AAOP74/datafile/o1_mf_system_j ASYNC_ON
zml41fy_.dbf

/opt/oracle/oradata/AAOP74/itblspc01.dbf           ASYNC_ON
/opt/oracle/oradata/AAOP74/datafile/o1_mf_sysaux_j ASYNC_ON
zml5rwh_.dbf

/opt/oracle/oradata/AAOP74/datafile/o1_mf_undotbs1 ASYNC_ON
_jzml6l1k_.dbf

/opt/oracle/oradata/AAOP74/dtblspc01.dbf           ASYNC_ON

NAME                                               ASYNCH_IO
-------------------------------------------------- ---------
/opt/oracle/oradata/AAOP74/datafile/o1_mf_users_jz ASYNC_ON
ml6m5f_.dbf

/opt/oracle/oradata/AAOP74/btblspc.dbf             ASYNC_ON
/opt/oracle/oradata/AAOP74/cm.dbf                  ASYNC_ON
/opt/oracle/oradata/AAOP74/cm.idx                  ASYNC_ON
/opt/oracle/oradata/AAOP74/bodtblspc.dbf           ASYNC_ON
/opt/oracle/oradata/AAOP74/boitblspc.dbf           ASYNC_ON
/opt/oracle/oradata/AAOP74/DTBLSPC03.dbf           ASYNC_ON
/opt/oracle/oradata/AAOP74/ITBLSPC03.idx           ASYNC_ON
/opt/oracle/product/19c/dbhome_1/dbs/reportdt.dat  ASYNC_ON

14 rows selected.

Smooth sailing….

Time to re-run the load test with my preferred tool and the results look encouraging.

As you can see the green results line is much more stable. Not only that, but number of transactions per second (TPS) increased to approx. 136 from 101 in the previous run. Response times also went down somewhat, from 90 to 70ish msecs.

The CPU waitio stats also dramatically improved on the Oracle server:

To summarize, it makes sense to scratch beyond the surface of performance bottlenecks before investing in HW upgrades or so… Sometimes the solution is a low-hanging fruit waiting to be picked.

External references:

ORACLE-BASE - Direct and Asynchronous I/O

I/O Configuration and Design

Apache HTTPD on FreeBSD and Linux Load Test

Comparison of infrastructure resource usage between Linux and FreeBSD HTTPD instances

For various reasons, I’ve had to perform a series of tests to ensure our Measuring Agent can generate traffic from a large number of source IP addresses. Aside from validating that capability, the by-result of the test is a somewhat interesting comparison of a FreeBSD and Linux based Apache HTTPD server.

Generating Load From Multiple IPs

First, a quick overview of what I wanted to prove: I needed to make sure that we can run a Load Test simulating a large number of source IP addresses. To validate this requirement, I’ve configured one of our Measuring Agents with approx. 12k IP addresses. I’ve used a bash script to do that, as otherwise it would take forever. All IPs are assigned as aliases to the NIC from where the load will be generated, and all IPs are within the same /16 subnet.

Finally, I’ve configured my Real Load test script with two additional steps:

  1. Step 0 that selects a random IP address configured on the NIC and stores it in a variable.
  2. Step 2 that instructs the load test to use as src IP the address stored in the variable.

Infrastructure Details

The hypervisor is a Windows 2019 Server Standard edition machine, running Hyper-V and fitted with an somewhat old Xeon E5-2683v3 CPU. The measuring agent and the tested servers are connected to the same virtual switch.

The Linux and FreeBSD VMs are minimal instals of their distributions, onto which I’ve installed the latest Apache HTTPD build offered by the built in software distribution mechanisms. That’s why the HTTPD versions are not identical.

In order for the results to be somewhat comparable, I’ve deployed the same set of static HTML pages on both servers. I’ve also aligned several key HTTPD config parameters on both systems, as shown in this table.

Parameter Measuring Agent FreeBSD HTTPD VM Linux HTTPD VM
OS Version RH 8.4 13.0 Oracle Lnx 8.4
RAM 4 GBs 4 GBs 4 GBs
vCPUs 10 4 4
HTTPD Version n/a 2.4.53 2.4.37
HTTPD MPM n/a event event
ServerLimit n/a 8192 8192
MaxRequestWorkers n/a 2048 2048
ThreadsPerChild n/a 25 25

See further down for other tuning parameters applied to the HTTPD VMs.

Load Test Execution and Result Metrics

I’ve then executed a 20 minutes 1000 VUs load test which, which is configured to maximize the number of HTTP requests generated. Apache is configured to server some static HTML pages, made up of text and some images.

This table summarizes metrics observed once the max. load was reached, approx. 10 minutes into the test. The PDF reports allow you to have a better glance into the test results.

Metric Linux HTTPD FreeBSD HTTPD
User CPU usage 21% 20%
System CPU usage 47 % 70%
Avg reqs/s 8.8k 10.3k
Avg network throughput 1.1 Gbps 1.3 Gbps
Hyper-V CPU usage 10% 11%
Test report PDF Linux Report PDF FreeBSD Report PDF
Test progress screenshot

Notes

  • CPU usage was measured with the “iostat 20” command.
  • Hyper-V CPU usage was taken from Windows Admin Center.

And the winner is…

… is difficult to pick, to be honest.

  • CPU usage, as measured by Hyper-V was a little bit higher for FreeBSD. CPU metrics measured within the VMs seem to indicate an overall higher CPU usage by FreeBSD (… in particular System CPU). Perhaps the Linux NIC driver is better optimized for Hyper-V.
  • FreeBSD HTTPD seems to deliver an higher throughput (network and avg requests/s).
  • FreeBSD HTTPD also seems to offer an higher HTTP Keep-Alive efficiency, which might partially explain the higher throughput.
  • Observations (like CPU usage, etc…) were averaged by “eyeballing” metrics displayed on screen. Expect some rounding error…

Assuming I had time to spend to better tune and align the two platforms, I might have been able to squeeze out a bit more performance from each server, but I doubt that would have materially changed the result in favor of one OS or the other. Obviously I’m happy to be proven wrong…

Feel free to email us with your feedback, I’ll be more than happy to test any further tuning suggestions.

OS Tuning

Below the OS level tuning that was applied to the Linux and FreeBSD servers. I didn’t have time to research in full each of the parameters mentioned below, they were mentioned in various other online sources and adopted. I’ve implemented the ones that seemed to make most sense…

Linux HTTPD (/etc/sysctl.conf)

The last 2 tunables were required to prevent the Linux server stopping accepting connections for various reasons…

fs.file-max = 524288
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_synack_retries = 3
net.ipv4.tcp_max_orphans = 65536
net.ipv4.tcp_fin_timeout = 30
net.ipv4.ip_local_port_range = 16384 60999
net.core.somaxconn = 256
net.core.rmem_max = 1048576
net.core.wmem_max = 1048576
net.core.message_cost=0
net.ipv4.neigh.default.gc_thresh3=64000

FreeBSD HTTPD (/etc/sysctl.conf)

kern.threads.max_threads_per_proc=4096
kern.ipc.somaxconn=4096
kern.ipc.maxsockets=204800
kern.ipc.nmbclusters=262144
kern.maxfiles=204800
kern.maxfilesperproc=200000
kern.maxvnodes=200000
net.inet.tcp.delayed_ack=0
net.inet.tcp.msl=5000
net.inet.tcp.maxtcptw=200000
net.inet.ip.intr_queue_maxlen=4096
net.inet.ip.dummynet.io_fast=1

Real Load Portal Generally Available!

The Real Load Portal portal.realload.com is open for publich registration.

The Real Load Portal portal.realload.com is open for publich registration.

Feel free to register for an account and trial our product by following the registration instructions here. You’ll be given a two weeks 100 VUs demo license and no credit card is required to sign up!

If you’d like a one-on-one session to guide you through the first steps of how to use our product, please do no hesitate to contact us at sales@realload.com .

Happy testing!

Desktop Companion 0.24 Released

Quick update walkthrough video

The Desktop Companion is a Desktop GUI that allows you to manage several Real Load aspects directly from your desktop.

We’ve released the latest update and this 5 minutes video illustrates the key changes.

Happy watching!

My system just got faster!

My system just got much faster… I wonder why.

Today I’ve started executing a lengthy performance test against a SOAP API to seed the underlying DB. For various reasons, I need to replicate the daily DB volume increase of a production system in my own lab DB.

I’ve prepared a Real Load test script and started hammering a server in my lab environment. I’ve noticed the performance wasn’t particularly good but that didn’t matter, as I wasn’t actually executing a performance test.

I’ve let the test run and went for lunch (… a sandwich). When I came back, I’ve noticed that my system became somehow much faster, raising from approx. 50 TPS to approx. 200 TPS. Each transaction represents a SOAP request…

See this graph from the real time monitoring window:

Knowing how this particular product works and knowing that typically the performance is limited by the performance of the underlying DB, I’ve started looking at various DB counters and one thing I’ve noticed is that the Response Time reported by MS SQL Studio on a particular DB file went down considerably (… from 100ms+ to 10-20ms).

That was curios…. why would this happen? I’ve then cast an eye on the metrics of my storage system (a TrueNAS self build…) and noticed that the ZFS L2 ARC read cache hits improved noticeably around that time. Notice the orange line, next to 0% hit ratio around 12PM and then raising to 90%+ after approx. 50 minutes.

Anyways… this just goes to show that having access to metrics of all infrastructure components during a load test is critical. But sometimes getting to these metrics can really be hard. Just need to persist to get to be bottom of things…..

Desktop Companion Enhancements

Features added to make recording of HTTP sessions more user friendly

This is a short update video to illustrate enhancements in the last update of the Desktop Companion.

Enhancements were done to the Proxy Recorder tab:

  • Allow adding page breaks as you navigate from page to page while recording.
  • Added a real-time counter of the requests being recorded.
  • Added a button to force the Desktop Companion window on top of others, so it doesn’t get hidden by browser windows.

All of the above is illustrated in this short video…

Desktop Companion Released

Conveniently manage AWS Measuring Agents from your desktop and more…

The Desktop Companion is a Desktop GUI that allows you to manage several Real Load aspects directly from your desktop.

It was freshly released in the last few days, plz do not hesitate to try it out. We’ve put together a short video that shows how to:

  • Prepare a simple load tests using the Recording Proxy on your Desktop.
  • Upload the load test to the Real Load Portal
  • Start an AWS EC2 Measuring Agent (Load Generator) from the Desktop Companion
  • Execute the load test script
  • Terminate the AWS EC2 Measuring Agent

All of this in 8 minutes…

It’s the first video I’ve ever had to publish, so I apologize in advance for the rather basic editing…

Real Load Plugins Introduction

Real Load plugins - Create, share or simply re-use.

A great new feature of the Real Load portal is that is allows you to share or simply consume plugins that have been prepared by others.

Plugins are written in Java. There are 3 types of plugin that are supported by the Real Load application:

  1. Session Element Plug-In - Typically used to generate custom data required by your load test script. For example:
  • Extract data from a DB.
  • Generate random data that follows a specific syntax.
  • Query an external webservice to obtain data to be injected in the load test.
  1. URL Plug-in - Allows you to modify request or response data:
  • Modify the HTTP request (…change the URL, etc…)
  • Modify response data.
  1. Java Source Code Modifier Plug-in - Allows to automtically modify a test script Java source code.

One of the key fetures of the product is that plugins can be optionally be published on the Real Load portal, for other users to consume. You can have a glimpse of available plugins here.

Interested in plugins but don’t know where to start? We’ll soon publish a getting started documentation on our website. In the meantime please reach out to us at support@realload.com

Real Load Demo Portal online

The Real Load Demo Portal demo2.realload.com is now up and running.

The demo portal demo2.realload.com is now available for selected customers who wish to evaluate the product functionality. You need an invitation code from us to sign up at the portal.

Quick Start Guide

  1. Navigate to demo2.realload.com and click at “Sign up”
  2. Enter your invitation code and follow the instructions
  3. Once you are signed in navigate to “Measuring Agents”
  4. Add the following Measuring Agent: agent2.realload.com port 8080
  5. Ping the Measuring Agent at application level
  6. Click at the “Wizards” icon to “HTTP Test Wizard”
  7. Define your first HTTP/S test, debug the test, save the session, generate the code and run your test

Note: The Measuring Agent agent2.realload.com has the following restrictions:

  • Maximum number of users per test job: 500
  • Maximum test job duration: 5 minutes

“alt attribute”

“alt attribute”

“alt attribute”