I am doing the aggregate function on column level like
The column value is increasing beyond default size of 2gb.
Spark job fails with an IllegalArgumentException: Cannot grow BufferHolder error.
java.lang.IllegalArgumentException: Cannot grow BufferHolder by size 95969 because the size after growing exceeds size limitation 2147483632
As we already known
BufferHolder has a maximum size of 2147483632 bytes (approximately 2 GB).
If a column value exceeds this size, Spark returns the exception.
I have removed all duplicate records, repartition(), increased the default patititions and did increase all memory parameters also but no use it is giving above error.
We have huge volume of data in a column after applying the agg of collect_set.
Is there any way to increase the BufferHolder maximum size of 2gb while processing.
Can you please send me customisation.
Any user defined function.