[Spark] Iceberg 테이블 Drop시 Error | [CANNOT_RECOGNIZE_HIVE_TYPE] Cannot recognize hive type string: "TIMESTAMP WITH LOCAL TIME ZONE"

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

지구정복

[Spark] Iceberg 테이블 Drop시 Error | [CANNOT_RECOGNIZE_HIVE_TYPE] Cannot recognize hive type string: "TIMESTAMP WITH LOCAL TIME ZONE" 본문

데이터 엔지니어링 정복/Spark

[Spark] Iceberg 테이블 Drop시 Error | [CANNOT_RECOGNIZE_HIVE_TYPE] Cannot recognize hive type string: "TIMESTAMP WITH LOCAL TIME ZONE"

noohhee 2025. 4. 18. 14:45

728x90

-환경

Spark 3.4.1

Iceberg 1.3.1

Hive 3.1.3

현재 Iceberg catalog는 Hive metastore로 사용중이다.

아래와 같이 Spark-sql로 Iceberg table을 생성하고 Drop하려니깐 에러가 발생했다.

CREATE TABLE iceberg_test_db.test_tbl (
  data STRING,
  log_timestamp TIMESTAMP
)
USING iceberg
PARTITIONED BY (days(log_timestamp))
TBLPROPERTIES (
    'read.parquet.vectorization.enabled' = 'true',
    'write.metadata.delete-after-commit.enabled' = 'true',
    'write.metadata.previous-versions-max' = '5',
    'format-version' = '2',
    'format' = 'parquet'
);

drop table iceberg_test_db.test_tbl;

에러 내용

Error: [CANNOT_RECOGNIZE_HIVE_TYPE] Cannot recognize hive type string: "TIMESTAMP WITH LOCAL TIME ZONE", column: `log_timestamp`., db: iceberg_test_db, table: test_tbl

테이블 생성시 TIMESTAMP를 사용하면 기본적으로 'TIMESTAMP WITH LOCAL TIME ZONE' 타입을 사용하고 있다.

하지만 현재 Catalog를 Hive metastore를 사용하고 있으므로

Spark에서 Iceberg테이블 만들시 timestamp 라고 하면 아래 공식 문서 표에 의해 timestamp with timezone 으로 만들어진다.

https://iceberg.apache.org/docs/1.5.1/spark-getting-started/#spark-type-to-iceberg-type

spark	iceberg	notes
boolean	boolean
short	integer
byte	integer
integer	integer
long	long
float	float
double	double
date	date
timestamp	timestamp with timezone
timestamp_ntz	timestamp without timezone
char	string
varchar	string
string	string
binary	binary
decimal	decimal
struct	struct
array	list
map	map

하지만 hive에선 timestamp with timezone 이런 데이터 타입은 존재하지 않는다.

또한 hive-site.conf에 iceberg.mr.schema.auto.conversion을 true로 했어도 timestamp는 아래 표 notest에 따르면 자동으로 변환되지 않는다.

아래는 hive 데이터 타입

hive	iceberg	notes
boolean	boolean
short	integer	auto-conversion
byte	integer	auto-conversion
integer	integer
long	long
float	float
double	double
date	date
timestamp	timestamp without timezone
timestamplocaltz	timestamp with timezone	Hive 3 only
interval_year_month		not supported
interval_day_time		not supported
char	string	auto-conversion
varchar	string	auto-conversion
string	string
binary	binary
decimal	decimal
struct	struct
list	list
map	map
union		not supported

따라서 iceberg catalog를 Hive metastore로 사용중이고 Spark로 Iceberg table을 만들거면 'timestamp'는 사용할 수 없고,

timestamp_ntz 타입을 사용한다.

CREATE TABLE iceberg_test_db.test_paloalto (
  data STRING,
  log_timestamp timestamp_ntz
)
USING iceberg
PARTITIONED BY (days(log_timestamp))
TBLPROPERTIES (
    'read.parquet.vectorization.enabled' = 'true',
    'write.metadata.delete-after-commit.enabled' = 'true',
    'write.metadata.previous-versions-max' = '5',
    'format-version' = '2',
    'format' = 'parquet'
)

728x90

저작자표시 동일조건

'데이터 엔지니어링 정복 > Spark' 카테고리의 다른 글

[Spark & Hive] Spark로 Hive Managed Table에 Write시 Error \| org.apache.hadoop.hive.ql.metadata.HiveException: Load Data failed for {임시파일경로} as the file is not owned by hive and load data is also not ran as hive (1)	2025.04.22
[Spark] Dynamic Allocation 사용 (0)	2025.04.18
[Spark] Dynamic partition strict mode requires at least one static partition column Error (2)	2025.04.15
[Spark] Hive table저장시 parquet 포맷 관련 설정 (1)	2025.04.13
[Spark] Spark 개념, 구성요소, 아키텍처 (0)	2021.05.10

'데이터 엔지니어링 정복/Spark' Related Articles

Comments

지구정복

[Spark] Iceberg 테이블 Drop시 Error | [CANNOT_RECOGNIZE_HIVE_TYPE] Cannot recognize hive type string: "TIMESTAMP WITH LOCAL TIME ZONE" 본문

[Spark] Iceberg 테이블 Drop시 Error | [CANNOT_RECOGNIZE_HIVE_TYPE] Cannot recognize hive type string: "TIMESTAMP WITH LOCAL TIME ZONE"

'데이터 엔지니어링 정복 > Spark' 카테고리의 다른 글

티스토리툴바