'apache iceberg' 태그의 글 목록

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

728x90

목록apache iceberg (9)

지구정복

[Iceberg] 이미 생성된 iceberg table의 Partition변경하기 | how to change the partition on the exising Iceberg table

spark 3.4.1trino 402iceberg 1.3.1 사용중 현재 스트리밍 데이터를 수집중인데 파티션이 초단위로 되어 있어서 HDFS에 small file이 너무 많이 쌓이는 중이다.따라서 파티션에 hours를 적용하여 시간 단위로 데이터가 뭉칠 수 있게 해준다. 결론: 여러가지 방법 해봤는데 실패.. 완전히 치환할 수 있는 방법이 없다.왜냐하면 아래 방법으로 하면1) CREATE TABLE 임시테이블 ()USING iceberg PARTITIONED BY (hours(time_column)) 2) INSERT INTO 임시테이블SELECT * FROM 기존테이블 3) ALTER TABLE 기존테이블 RENAME TO 기존테이블_backup 4) HDFS에서 기존테이블 디렉터리 이름 변경h..

데이터 엔지니어링 정복/Iceberg 2025. 5. 28. 15:48

[Iceberg] Table Maintenance | 테이블 관리방법

현재 아이스버그 1.3.1을 사용중이다.공식문서는 아래 1.3.1을 참고한다.https://web.archive.org/web/20240826175720/https://iceberg.apache.org/docs/latest/maintenance/ 스파크는 3.4.1트리노는 402버전 사용중 정리해야하는 파일들은 다음과 같다.-Old Metadata Files-Expired Snapshot Files-Manifests Files-Data Files-Orphan Files 위 파일들은 사용자가 어쩔 수 없이 수동으로 정리를 해줘야 한다. 1. Metadata Files 관리아이스버그는 JSON포맷인 metadata file을 통해 테이블의 변경을 추적한다.테이블의 어떠한 변경이라도 일어나면 metadata..

데이터 엔지니어링 정복/Iceberg 2025. 5. 28. 13:12

[Iceberg] Iceberg Guide Book Summary | CHAPTER 6. Apache Spark

CHAPTER 6 Apache SparkConfigurationConfiguring Apache Iceberg and SparkConfiguring via the CLIAs a first step, you’ll need to specify the required packages to be installed and used with the Spark session. To do so, Spark provides the --packages option, which allows Spark to easily download the specified Maven-based packages and its dependencies to add them to the classpath of your application. ..

데이터 엔지니어링 정복/Iceberg 2025. 3. 9. 21:12

[Iceberg] Iceberg Guide Book Summary | CHAPTER 5. Iceberg Catalogs

CHAPTER 5 Iceberg Catalogs Requirements of an Iceberg CatalogIceberg provides a catalog interface that requires the implementation of a set of functions, primarily ones to list existing tables, create tables, drop tables, check whether a table exists, and rename tables. Hive Metastore, AWS Glue, and a filesystem catalog (Hadoop). with a filesystem as the catalog, there’s a file called version-hi..

데이터 엔지니어링 정복/Iceberg 2025. 3. 9. 21:10

Prev 1 2 3 Next

목록apache iceberg (9)

지구정복

티스토리툴바