반응형
Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
Tags
- pyspark
- 용인맛집
- Apache Kafka
- 영어
- 코딩테스트
- BigData
- 삼성역맛집
- Linux
- 프로그래머스
- 코딩
- Data Engineering
- 코엑스맛집
- bigdata engineering
- hadoop
- Spark
- 개발
- 자바
- java
- Data Engineer
- HIVE
- 알고리즘
- 여행
- 맛집
- apache iceberg
- bigdata engineer
- 백준
- Iceberg
- 코테
- Trino
- Kafka
Archives
- Today
- Total
지구정복
[Spark] org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8.0 GiB: 11.3 GiB. 본문
데이터 엔지니어링 정복/Spark
[Spark] org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8.0 GiB: 11.3 GiB.
noohhee 2025. 7. 1. 11:11728x90
반응형
spark 3.4.1
내부적으로 브로드캐스트 조인시에 브로드캐스트될 테이블이 너무 클 경우 아래 에러가 발생한다.
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/my/current/spark3-client/python/pyspark/sql/session.py", line 1440, in sql return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self) File "/usr/my/current/spark3-client/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1323, in __call__ File "/usr/my/current/spark3-client/python/pyspark/errors/exceptions/captured.py", line 169, in deco return f(*a, **kw) File "/usr/my/current/spark3-client/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o84.sql. : org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8.0 GiB: 11.3 GiB. at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotBroadcastTableOverMaxTableBytesError(QueryExecutionErrors.scala:2366) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:217) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) |
이럴 경우 조인하는 쿼리를 튜닝하는 것이 제일 좋겠지만 어쩔 수 없이 브로드캐스트 조인을 해야된다면 아래 설정값을 설정한다.
spark.sql.~ 관련 설정은 런타임중에 변경가능하다.
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1)
728x90
반응형
'데이터 엔지니어링 정복 > Spark' 카테고리의 다른 글
Comments