[jira] [Created] (IGNITE-12070) Document the new ability to track system/user time of transactions

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (IGNITE-12070) Document the new ability to track system/user time of transactions

Anton Vinogradov (Jira)
Denis Chudov created IGNITE-12070:
-------------------------------------

             Summary: Document the new ability to track system/user time of transactions
                 Key: IGNITE-12070
                 URL: https://issues.apache.org/jira/browse/IGNITE-12070
             Project: Ignite
          Issue Type: Task
          Components: documentation
            Reporter: Denis Chudov


Now there is ability to track system/user time of transactions. System time is the time that is spent for system activities - i.e. time while aquiring locks, preparing, commiting, etc.User time is the time that is spent for user activities when client node runs some code while holding transaction.

We have ability to log info about transactions that exceed some threshold execution timeout, or some percentage of all transactions. Log record in case of long-running transactions looks like following:
{code:java}
[2019-08-09 13:39:49,130][WARN ][sys-stripe-1-#101%client%][root] Long transaction time dump [startTime=13:39:47.970, totalTime=1160, systemTime=157, userTime=1003, cacheOperationsTime=141, prepareTime=15, commitTime=0, tx=GridNearTxLocal [...]]
{code}
In case of sampling of all transactions:


{code:java}
[2019-08-09 13:39:54,079][INFO ][sys-stripe-2-#102%client%][root] Transaction time dump [startTime=13:39:54.063, totalTime=15, systemTime=6, userTime=9, cacheOperationsTime=2, prepareTime=3, commitTime=0, tx=GridNearTxLocal [...]]
{code}
Also some of transactions can be skipped to not overflow the log, information about this log throttling looks like this:
{code:java}
[2019-08-09 13:39:55,109][INFO ][sys-stripe-0-#100%client%][root] Transaction time dumps skipped because of log throttling: 2
{code}
There are JMX parameters and JVM options to control this behavior:
1)
JVM option: IGNITE_LONG_TRANSACTION_TIME_DUMP_THRESHOLD
JMX parameter: TransactionsMXBean.longTransactionTimeDumpThreshold
Threshold timeout in milliseconds for long transactions, if transaction exceeds it, it will be dumped in log with information about how much time did it spent in system time and user time. Default value is 0. No info about system/user time of long transactions is dumped in log if this parameter is not set.
2)
JVM option: IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_COEFFICIENT
JMX parameter: TransactionsMXBean.transactionTimeDumpSamplesCoefficient
The coefficient for samples of completed transactions that will be dumped in log. Must be float value between 0.0 and 1.0 inclusive. Default value is 0.0.
3)
JVM option: IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_PER_SECOND_LIMIT
JMX parameter: TransactionsMXBean.transactionTimeDumpSamplesPerSecondLimit
The limit of samples of completed transactions that will be dumped in log per second, if IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_COEFFICIENT is above 0.0. Must be integer value greater than 0. Default value is 5.

For the existing long running transaction warning was added information about current system and user time of transaction:
{code:java}
[2019-08-09 14:10:31,835][WARN ][grid-timeout-worker-#122%client%][root] First 10 long running transactions [total=1]
[2019-08-09 14:10:31,835][WARN ][grid-timeout-worker-#122%client%][root] >>> Transaction [startTime=14:10:31.170, curTime=14:10:31.750, systemTime=32, userTime=548, tx=GridNearTxLocal [...]]
{code}

Also added following metrics to monitor system and user time for single node:
diagnostic.transactions.totalNodeSystemTime - Total transactions system time on node.
diagnostic.transactions.totalNodeUserTime - Total transactions user time on node.
diagnostic.transactions.nodeSystemTimeHistogram - Transactions system times on node represented as histogram.
diagnostic.transactions.nodeUserTimeHistogram - Transactions user times on node represented as histogram.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)