Oracle and async I/O... A world of a difference
While running a load test against an API product that I have to deal with in my other day to day job, I’ve noticed something in both the results and at the OS level (… on the DB server) that didn’t make much sense.
The results of the performance test where somewhat OK but kinda unstable (… some strange variances in the response times). This graph tells the story better than 1000s of words.
Note the green line (transactions per second) going all over the place:
At first I suspected some sort of issue with the application server (Weblogic) and the DB Connection Pool. But all looked good there…
Then I’ve cast an eye on Oracle Enterprise Manager and noticed that most of the DB waits were related to I/O, although the storage of this particular test DB is located on a reasonably fast NVME SSD.
So I started looking at I/O stats on the Oracle Linux server hosting this DB. Being a lab DB, it’s more or less a standard Oracle install with not much performance tuning applied. Nor am I an Oracle expert that knows all secrets of the trade…
Anyways, there was one thing that somehow didn’t stack up: At the OS level, the % spent by the CPU in iowait was sporadically incredibly high (… 70%+ or so) with the CPU idle time plunging to less than 10%:
After reading various online articles about this, most of which suggested beefier HW or rewrite the app the app so that it would be more efficient with commits, it dawned on my that perhaps Oracle wasn’t using async I/O when writing to disk causing these high waitio stats.
I finally bumped into a few articles talking about async I/O settings in Oracle and found a few useful SQL queries…
This one will assist in figuring out whether async I/O is enabled on your Oracle DB files:
COL NAME FORMAT A50 SELECT NAME,ASYNCH_IO FROM V$DATAFILE F,V$IOSTAT_FILE I WHERE F.FILE#=I.FILE_NO AND FILETYPE_NAME='Data File';
… leading to a result like this. Note that for all files async IO is disabled…
So I decided to enable async I/O with these few SQL commands:
ALTER SYSTEM SET FILESYSTEMIO_OPTIONS=SETALL SCOPE=SPFILE; SHUTDOWN IMMEDIATE; STARTUP;
… and then checking again. As you can see, async I/O is enabled now:
NAME ASYNCH_IO -------------------------------------------------- --------- /opt/oracle/oradata/AAOP74/datafile/o1_mf_system_j ASYNC_ON zml41fy_.dbf /opt/oracle/oradata/AAOP74/itblspc01.dbf ASYNC_ON /opt/oracle/oradata/AAOP74/datafile/o1_mf_sysaux_j ASYNC_ON zml5rwh_.dbf /opt/oracle/oradata/AAOP74/datafile/o1_mf_undotbs1 ASYNC_ON _jzml6l1k_.dbf /opt/oracle/oradata/AAOP74/dtblspc01.dbf ASYNC_ON NAME ASYNCH_IO -------------------------------------------------- --------- /opt/oracle/oradata/AAOP74/datafile/o1_mf_users_jz ASYNC_ON ml6m5f_.dbf /opt/oracle/oradata/AAOP74/btblspc.dbf ASYNC_ON /opt/oracle/oradata/AAOP74/cm.dbf ASYNC_ON /opt/oracle/oradata/AAOP74/cm.idx ASYNC_ON /opt/oracle/oradata/AAOP74/bodtblspc.dbf ASYNC_ON /opt/oracle/oradata/AAOP74/boitblspc.dbf ASYNC_ON /opt/oracle/oradata/AAOP74/DTBLSPC03.dbf ASYNC_ON /opt/oracle/oradata/AAOP74/ITBLSPC03.idx ASYNC_ON /opt/oracle/product/19c/dbhome_1/dbs/reportdt.dat ASYNC_ON 14 rows selected.
Time to re-run the load test with my preferred tool and the results look encouraging.
As you can see the green results line is much more stable. Not only that, but number of transactions per second (TPS) increased to approx. 136 from 101 in the previous run. Response times also went down somewhat, from 90 to 70ish msecs.
The CPU waitio stats also dramatically improved on the Oracle server:
To summarize, it makes sense to scratch beyond the surface of performance bottlenecks before investing in HW upgrades or so… Sometimes the solution is a low-hanging fruit waiting to be picked.