[Iceberg] org.apache.iceberg.exceptions.ValidationException: Manifest is missing

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

지구정복

[Iceberg] org.apache.iceberg.exceptions.ValidationException: Manifest is missing 본문

데이터 엔지니어링 정복/Iceberg

[Iceberg] org.apache.iceberg.exceptions.ValidationException: Manifest is missing

noohhee 2025. 7. 1. 11:09

728x90

iceberg 1.3.1

spark 3.4.1

hive 3.1.3

rewrite_manifest call procedure사용하다가 아래 에러 발생.

25/07/01 00:09:18 INFO BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://nameservice1/user/hive/warehouse/iceberg_test_db.db/my_test2/metadata/647318-e501c610-fe41-4a9a-bed0-7f10deec9c2b.metadata.json
Traceback (most recent call last):
File "/home/airflow_dags/src/pyspark/iceberg/IcebergMaintenance.py", line 93, in <module>
execute_all_commands(db_name, table_name)
File "/home/airflow_dags/src/pyspark/iceberg/IcebergMaintenance.py", line 7, in execute_all_commands
rewrite_manifests(db_name, table_name)
File "/home/airflow_dags/src/pyspark/iceberg/IcebergMaintenance.py", line 38, in rewrite_manifests
spark.sql(rewrite_manifests_sql).show()
File "/usr/my/current/spark3-client/python/lib/pyspark.zip/pyspark/sql/session.py", line 1440, in sql
File "/usr/my/current/spark3-client/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1323, in __call__
File "/usr/my/current/spark3-client/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
File "/usr/my/current/spark3-client/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o83.sql.
: org.apache.iceberg.exceptions.ValidationException: Manifest is missing: hdfs://nameservice1/user/hive/warehouse/iceberg_test_db.db/my_test2/metadata/dd366b69-ce4c-4162-807e-5c7194357d74-m0.avro
at org.apache.iceberg.BaseRewriteManifests.lambda$validateDeletedManifests$7(BaseRewriteManifests.java:285)
at java.util.Optional.ifPresent(Optional.java:159)
at org.apache.iceberg.BaseRewriteManifests.validateDeletedManifests(BaseRewriteManifests.java:283)
at org.apache.iceberg.BaseRewriteManifests.apply(BaseRewriteManifests.java:181)
at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:217)
at org.apache.iceberg.BaseRewriteManifests.apply(BaseRewriteManifests.java:50)
at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:366)
at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)

at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:364)
at org.apache.iceberg.BaseRewriteManifests.commit(BaseRewriteManifests.java:50)
at org.apache.iceberg.spark.actions.BaseSnapshotUpdateSparkAction.commit(BaseSnapshotUpdateSparkAction.java:40)
at org.apache.iceberg.spark.actions.RewriteManifestsSparkAction.replaceManifests(RewriteManifestsSparkAction.java:338)
at org.apache.iceberg.spark.actions.RewriteManifestsSparkAction.doExecute(RewriteManifestsSparkAction.java:193)
at org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:137)
at org.apache.iceberg.spark.actions.RewriteManifestsSparkAction.execute(RewriteManifestsSparkAction.java:148)
at org.apache.iceberg.spark.procedures.RewriteManifestsProcedure.lambda$call$0(RewriteManifestsProcedure.java:98)
at org.apache.iceberg.spark.procedures.BaseProcedure.execute(BaseProcedure.java:104)
at org.apache.iceberg.spark.procedures.BaseProcedure.modifyIcebergTable(BaseProcedure.java:85)
at org.apache.iceberg.spark.procedures.RewriteManifestsProcedure.call(RewriteManifestsProcedure.java:89)
at org.apache.spark.sql.execution.datasources.v2.CallExec.run(CallExec.scala:34)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:640)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:662)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:748)

최근 HDFS directory 파일 개수 limit에러로 인해 iceberg meta파일들이 제대로 쓰여지지 않아서 해당 manifest file이 missing된 것 같다.

spark driver, executor memory늘려주고 다시 실행하니깐 해결됐다.

q="CALL spark_catalog.system.rewrite_manifests('iceberg_test_db.my_test2')"
spark.sql(q).show()

728x90

저작자표시 동일조건 (새창열림)

'데이터 엔지니어링 정복 > Iceberg' 카테고리의 다른 글

[Iceberg] 이미 생성된 iceberg table의 Partition변경하기 \| how to change the partition on the exising Iceberg table (0)	2025.05.28
[Iceberg] Table Maintenance \| 테이블 관리방법 (0)	2025.05.28
[Iceberg] Iceberg Guide Book Summary \| CHAPTER 6. Apache Spark (1)	2025.03.09
[Iceberg] Iceberg Guide Book Summary \| CHAPTER 5. Iceberg Catalogs (0)	2025.03.09
[Iceberg] Iceberg Guide Book Summary \| CHAPTER 4. Optimizing the Performance of Iceberg Tables (0)	2025.03.09

'데이터 엔지니어링 정복/Iceberg' Related Articles

Comments

지구정복

[Iceberg] org.apache.iceberg.exceptions.ValidationException: Manifest is missing 본문

[Iceberg] org.apache.iceberg.exceptions.ValidationException: Manifest is missing

'데이터 엔지니어링 정복 > Iceberg' 카테고리의 다른 글

티스토리툴바