일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
- 프로그래머스
- java
- bigdata engineering
- 코딩테스트
- 코엑스맛집
- 알고리즘
- 영어
- 여행
- 개발
- Kafka
- 삼성역맛집
- bigdata engineer
- 백준
- Apache Kafka
- Data Engineering
- HIVE
- apache iceberg
- hadoop
- Data Engineer
- Iceberg
- Trino
- BigData
- 맛집
- 자바
- Spark
- 용인맛집
- 코딩
- pyspark
- 코테
- 코엑스
- Today
- Total
지구정복
[NiFi] 주로 사용하는 프로세서 설명 본문
업무간 사용하는 프로세서에 대해서 공부해보자
NiFi 1.15.2 사용중이다.
- ListenUDP
설명:
Listens for Datagram Packets on a given port.
The default behavior produces a FlowFile per datagram,
however for higher throughput the Max Batch Size property may be increased to specify the number of datagrams to batch together in a single FlowFile.
This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties,
otherwise it will listen for datagrams from all hosts and ports.
설정이름 | API Name |
Default value |
Aloowable values |
설명 | 내가 설정한 값 |
Local Network Interface | Local Network Interface | The name of a local network interface to be used to restrict listening to a specific LAN. Supports Expression Language: true (will be evaluated using variable registry only) |
bond0 | ||
Port | Port | The port to listen on for communication. Supports Expression Language: true (will be evaluated using variable registry only) |
8888 | ||
Receive Buffer Size | Receive Buffer Size | 65507 B | The size of each buffer used to receive messages. Adjust this value appropriately based on the expected size of the incoming messages. |
16777216 NiFi서버의 net.core.rmem_max 값 이하로 설정 |
|
Max Size of Message Queue | Max Size of Message Queue | 10000 | The maximum size of the internal queue used to buffer messages being transferred from the underlying channel to the processor. Setting this value higher allows more messages to be buffered in memory during surges of incoming messages, but increases the total memory used by the processor. | 100000 | |
Max Size of Socket Buffer | Max Size of Socket Buffer | 1 MB | The maximum size of the socket buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped. |
1 GB | |
Character Set | Character Set | UTF-8 | Specifies the character set of the received data. | ||
Max Batch Size | Max Batch Size | 1 | The maximum number of messages to add to a single FlowFile. If multiple messages are available, they will be concatenated along with the <Message Delimiter> up to this configured maximum number of messages The default value is 1, which means a message-per–flow-file. A single message per flow file is useful for downstream parsing and routing, but provides the worst performance scenario. Increasing the batch size will drastically reduce the amount I/O operations performed, and will likely provide the greatest overall performance improvement. 이 설정값에 따라 하나의 플로우파일이 된다. |
||
Batching Message Delimiter | Message Delimiter | \n | Specifies the delimiter to place between messages when multiple messages are bundled together (see <Max Batch Size> property). | ||
Sending Host | Sending Host | IP, or name, of a remote host. Only Datagrams from the specified Sending Host Port and this host will be accepted. Improves Performance. May be a system property or an environment variable. Supports Expression Language: true (will be evaluated using variable registry only) |
|||
Sending Host Port | Sending Host Port | Port being used by remote host to send Datagrams. Only Datagrams from the specified Sending Host and this port will be accepted. Improves Performance. May be a system property or an environment variable. Supports Expression Language: true (will be evaluated using variable registry only) |
- GenerateFlowFile
This processor creates FlowFiles with random data or custom content.
GenerateFlowFile is useful for load testing, configuration, and simulation.
Also see DuplicateFlowFile for additional load testing.
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display NameAPI NameDefault ValueAllowable ValuesDescription
File Size | File Size | 0B | The size of the file that will be used | |
Batch Size | Batch Size | 1 | The number of FlowFiles to be transferred in each invocation | |
Data Format | Data Format | Text |
|
Specifies whether the data should be Text or Binary |
Unique FlowFiles | Unique FlowFiles | false |
|
If true, each FlowFile that is generated will be unique. If false, a random value will be generated and all FlowFiles will get the same content but this offers much higher throughput |
Custom Text | generate-ff-custom-text | If Data Format is text and if Unique FlowFiles is false, then this custom text will be used as content of the generated FlowFiles and the File Size will be ignored. Finally, if Expression Language is used, evaluation will be performed only once per batch of generated FlowFiles Supports Expression Language: true (will be evaluated using variable registry only) |
||
Character Set | character-set | UTF-8 | Specifies the character set to use when writing the bytes of Custom Text to a flow file. | |
Mime Type | mime-type | Specifies the value to set for the "mime.type" attribute. |
Dynamic Properties:
Supports Sensitive Dynamic Properties: No
Dynamic Properties allow the user to specify both the name and value of a property.
NameValueDescription
Generated FlowFile attribute name | Generated FlowFile attribute value | Specifies an attribute on generated FlowFiles defined by the Dynamic Property's key and value. If Expression Language is used, evaluation will be performed only once per batch of generated FlowFiles. Supports Expression Language: true (will be evaluated using variable registry only) |
Relationships:
NameDescription
success |
만약 아래와 같은 데이터를 생성하고 싶다면 다음과 같은 내용을 작성한다.
Custom Text에 아래와 같이 설정한다.
<test> ${now():format('MMM dd yyyy HH:mm:ss')} my-test-data-nifi : %FTD-4-419002: Duplicate TCP SYN from out-beeline:${random():toString()} to out-beeline:${random():toString()} with different initial sequence number
- ExtractGrok
ExtractGrok
Description:
Evaluates one or more Grok Expressions against the content of a FlowFile, adding the results as attributes or replacing the content of the FlowFile with a JSON notation of the matched content
플로우파일의 content에 대해 하나 이상의 그록패턴을 평가한다.
attributes를 추가하거나 content를 json형태로 교체한다.
Tags:
grok, log, text, parse, delimit, extract
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Display NameAPI NameDefault ValueAllowable ValuesDescription
Grok Expression | Grok Expression | Grok expression. If other Grok expressions are referenced in this expression, they must be provided in the Grok Pattern File if set or exist in the default Grok patterns 다음과 같으면 flowfile의 모든 content가 data라는 필드로 정의된다. (?s)%{GREEDYDATA:data} |
||
Grok Patterns | Grok Pattern file | Custom Grok pattern definitions. These definitions will be loaded after the default Grok patterns. The Grok Parser will use the default Grok patterns when this property is not configured. This property requires exactly one resource to be provided. That resource may be any of the following types: URL, file, text. |
||
Destination | Destination | flowfile-attribute |
|
Control if Grok output value is written as a new flowfile attributes, in this case each of the Grok identifier that is matched in the flowfile will be added as an attribute, prefixed with "grok." or written in the flowfile content. Writing to flowfile content will overwrite any existing flowfile content. |
Character Set | Character Set | UTF-8 | The Character Set in which the file is encoded | |
Maximum Buffer Size | Maximum Buffer Size | 1 MB | Specifies the maximum amount of data to buffer (per file) in order to apply the Grok expressions. Files larger than the specified maximum will not be fully evaluated. |
|
Named captures only | Named captures only | false |
|
Only store named captures from grok |
Keep Empty Captures | Keep Empty Captures | true |
|
If true, then empty capture values will be included in the returned capture map. |
Relationships:
NameDescription
unmatched | FlowFiles are routed to this relationship when no provided Grok Expression matches the content of the FlowFile |
matched | FlowFiles are routed to this relationship when the Grok Expression is successfully evaluated and the FlowFile is modified as a result |
Reads Attributes:
None specified.
Writes Attributes:
NameDescription
grok.XXX | When operating in flowfile-attribute mode, each of the Grok identifier that is matched in the flowfile will be added as an attribute, prefixed with "grok." For example,if the grok identifier "timestamp" is matched, then the value will be added to an attribute named "grok.timestamp" |
State management:
This component does not store state.
Restricted:
Required PermissionExplanation
reference remote resources | Patterns can reference resources over HTTP |
Input requirement:
This component requires an incoming relationship.
System Resource Considerations:
None specified.
UpdateRecord
Description:
Updates the contents of a FlowFile that contains Record-oriented data (i.e., data that can be read via a RecordReader and written by a RecordWriter). This Processor requires that at least one user-defined Property be added. The name of the Property should indicate a RecordPath that determines the field that should be updated. The value of the Property is either a replacement value (optionally making use of the Expression Language) or is itself a RecordPath that extracts a value from the Record. Whether the Property value is determined to be a RecordPath or a literal value depends on the configuration of the <Replacement Value Strategy> Property.
Tags:
update, record, generic, schema, json, csv, avro, log, logs, freeform, text
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Display NameAPI NameDefault ValueAllowable ValuesDescription
Record Reader | record-reader | Controller Service API: RecordReaderFactory Implementations: CEFReader SyslogReader ReaderLookup ProtobufReader Syslog5424Reader CSVReader GrokReader WindowsEventLogReader ScriptedReader AvroReader ParquetReader JsonPathReader ExcelReader JsonTreeReader YamlTreeReader XMLReader |
Specifies the Controller Service to use for reading incoming data | |
Record Writer | record-writer | Controller Service API: RecordSetWriterFactory Implementations: JsonRecordSetWriter RecordSetWriterLookup AvroRecordSetWriter XMLRecordSetWriter FreeFormTextRecordSetWriter CSVRecordSetWriter ParquetRecordSetWriter ScriptedRecordSetWriter |
Specifies the Controller Service to use for writing out the records | |
Replacement Value Strategy | replacement-value-strategy | Literal Value |
|
Specifies how to interpret the configured replacement values |
Dynamic Properties:
Supports Sensitive Dynamic Properties: No
Dynamic Properties allow the user to specify both the name and value of a property.
NameValueDescription
A RecordPath. | The value to use to replace fields in the record that match the RecordPath | Allows users to specify values to use to replace fields in the record that match the RecordPath. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
Relationships:
NameDescription
success | FlowFiles that are successfully transformed will be routed to this relationship |
failure | If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship |
Reads Attributes:
None specified.
Writes Attributes:
NameDescription
record.index | This attribute provides the current row index and is only available inside the literal value expression. |
record.error.message | This attribute provides on failure the error message encountered by the Reader or Writer. |
State management:
This component does not store state.
Restricted:
This component is not restricted.
Input requirement:
This component requires an incoming relationship.
System Resource Considerations:
None specified.
See Also:
'데이터 엔지니어링 정복 > NiFi' 카테고리의 다른 글
[NiFi & Linux] NiFi log 관리하기 (1) | 2025.05.09 |
---|---|
[NiFi] Data Pipeline | log -> Json -> gzip.parquet -> HDFS -> Iceberg Table (0) | 2025.04.22 |