πŸ’°DCU calculation

DCU (DataFlint Compute Unit) Is measurement unit for spark usage, which is a similar concept to DBU (DataBricks Unit)

The formula for DCU is: (Core/Hour usage * 0.05) + (GiB Memory/Hour usage * 0.005)

Core/Hour: is the number of cores allocated for your app in hours measurement. For example: Application that allocated 1 core for one hour will consume 1 core/hour, which will be the same as an application that allocated 6 cores for 10 minutes.

GiB Memory/Hour: is the number of memory in GiB units allocated for your app in hours measurement For example: Application that allocated 1 GiB for one hour will consume 1 GiB memory/hour, which will be the same as an application that allocated 6 GiB for 10 minutes.

Core/hour and GiB memory/hour ratios: because we want to look at app resource usage as one metric and not as 2 (core/hour and GiB memory/hour) we need a formula to convert this 2 metrics into 1 metric. We chose for the ratio to multiply each metric is the EMR serverless cost in US East for X86 server, and removed round it up to get more easy-to-understand numbers (0.052624 -> 0.05, 0.0057785 -> 0.005)

So 1 DFU if using EMR serverless will roughly cost 1$. On other platforms is still a good base unit to compare app's performance, but will not be equal to exactly 1$.

Example calculation: 100 core/hour usage and 1000 GiB memory/hour usage is (100 * 0.05) + (1000 * 0.005) = 10 DCU

Example from DataFlint UI:

Last updated