반응형
Notice
Recent Posts
Recent Comments
Link
관리 메뉴

지구정복

[NiFi] 주로 사용하는 프로세서 설명 본문

데이터 엔지니어링 정복/NiFi

[NiFi] 주로 사용하는 프로세서 설명

noohhee 2025. 4. 16. 14:15
728x90
반응형

 

 

업무간 사용하는 프로세서에 대해서 공부해보자

 

NiFi 1.15.2 사용중이다.

 

  • ListenUDP

설명:

Listens for Datagram Packets on a given port.

The default behavior produces a FlowFile per datagram,

however for higher throughput the Max Batch Size property may be increased to specify the number of datagrams to batch together in a single FlowFile.

This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties,

otherwise it will listen for datagrams from all hosts and ports.

 

 

 

설정이름 API
Name
Default
value
Aloowable
values
설명 내가 설정한 값
Local Network Interface Local Network Interface     The name of a local network interface to be used to restrict listening to a specific LAN.
Supports Expression Language: true (will be evaluated using variable registry only)
bond0
Port Port     The port to listen on for communication.
Supports Expression Language: true (will be evaluated using variable registry only)
8888
Receive Buffer Size Receive Buffer Size 65507 B   The size of each buffer used to receive messages.
Adjust this value appropriately based on the expected size of the incoming messages.
16777216
NiFi서버의 net.core.rmem_max 값 이하로 설정
Max Size of Message Queue Max Size of Message Queue 10000   The maximum size of the internal queue used to buffer messages being transferred from the underlying channel to the processor. Setting this value higher allows more messages to be buffered in memory during surges of incoming messages, but increases the total memory used by the processor. 100000
Max Size of Socket Buffer Max Size of Socket Buffer 1 MB   The maximum size of the socket buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped.

1 GB
Character Set Character Set UTF-8   Specifies the character set of the received data.  
Max Batch Size Max Batch Size 1   The maximum number of messages to add to a single FlowFile. If multiple messages are available, they will be concatenated along with the <Message Delimiter> up to this configured maximum number of messages

The default value is 1, which means a message-per–flow-file. A single message per flow file is useful for downstream parsing and routing, but provides the worst performance scenario. Increasing the batch size will drastically reduce the amount I/O operations performed, and will likely provide the greatest overall performance improvement.

이 설정값에 따라 하나의 플로우파일이 된다.
 
Batching Message Delimiter Message Delimiter \n   Specifies the delimiter to place between messages when multiple messages are bundled together (see <Max Batch Size> property).  
Sending Host Sending Host     IP, or name, of a remote host. Only Datagrams from the specified Sending Host Port and this host will be accepted. Improves Performance. May be a system property or an environment variable.
Supports Expression Language: true (will be evaluated using variable registry only)
 
Sending Host Port Sending Host Port     Port being used by remote host to send Datagrams. Only Datagrams from the specified Sending Host and this port will be accepted. Improves Performance. May be a system property or an environment variable.
Supports Expression Language: true (will be evaluated using variable registry only)
 

 

 

  • GenerateFlowFile

This processor creates FlowFiles with random data or custom content.

GenerateFlowFile is useful for load testing, configuration, and simulation.

Also see DuplicateFlowFile for additional load testing.

 

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription

         
File Size File Size 0B   The size of the file that will be used
Batch Size Batch Size 1   The number of FlowFiles to be transferred in each invocation
Data Format Data Format Text
  • Binary
  • Text
Specifies whether the data should be Text or Binary
Unique FlowFiles Unique FlowFiles false
  • true
  • false
If true, each FlowFile that is generated will be unique. If false, a random value will be generated and all FlowFiles will get the same content but this offers much higher throughput
Custom Text generate-ff-custom-text     If Data Format is text and if Unique FlowFiles is false, then this custom text will be used as content of the generated FlowFiles and the File Size will be ignored.
Finally, if Expression Language is used, evaluation will be performed only once per batch of generated FlowFiles
Supports Expression Language: true (will be evaluated using variable registry only)
Character Set character-set UTF-8   Specifies the character set to use when writing the bytes of Custom Text to a flow file.
Mime Type mime-type     Specifies the value to set for the "mime.type" attribute.

Dynamic Properties:

Supports Sensitive Dynamic Properties: No

Dynamic Properties allow the user to specify both the name and value of a property.

NameValueDescription

Generated FlowFile attribute name Generated FlowFile attribute value Specifies an attribute on generated FlowFiles defined by the Dynamic Property's key and value. If Expression Language is used, evaluation will be performed only once per batch of generated FlowFiles.
Supports Expression Language: true (will be evaluated using variable registry only)

Relationships:

NameDescription

success  

 

 

만약 아래와 같은 데이터를 생성하고 싶다면 다음과 같은 내용을 작성한다.

Custom Text에 아래와 같이 설정한다.

<test> ${now():format('MMM dd yyyy HH:mm:ss')} my-test-data-nifi : %FTD-4-419002: Duplicate TCP SYN from out-beeline:${random():toString()} to out-beeline:${random():toString()} with different initial sequence number

 

 

 

 

 

 

  • ExtractGrok

ExtractGrok

Description:

Evaluates one or more Grok Expressions against the content of a FlowFile, adding the results as attributes or replacing the content of the FlowFile with a JSON notation of the matched content

플로우파일의 content에 대해 하나 이상의 그록패턴을 평가한다.

attributes를 추가하거나 content를 json형태로 교체한다.

Tags:

grok, log, text, parse, delimit, extract

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Display NameAPI NameDefault ValueAllowable ValuesDescription

Grok Expression Grok Expression     Grok expression. If other Grok expressions are referenced in this expression, they must be provided in the Grok Pattern File if set or exist in the default Grok patterns

다음과 같으면 flowfile의 모든 content가 data라는 필드로 정의된다.
(?s)%{GREEDYDATA:data}

Grok Patterns Grok Pattern file     Custom Grok pattern definitions. These definitions will be loaded after the default Grok patterns. The Grok Parser will use the default Grok patterns when this property is not configured.

This property requires exactly one resource to be provided. That resource may be any of the following types: URL, file, text.
Destination Destination flowfile-attribute
  • flowfile-attribute
  • flowfile-content
Control if Grok output value is written as a new flowfile attributes,
in this case each of the Grok identifier that is matched in the flowfile will be added as an attribute, prefixed with "grok."
or written in the flowfile content.
Writing to flowfile content will overwrite any existing flowfile content.
Character Set Character Set UTF-8   The Character Set in which the file is encoded
Maximum Buffer Size Maximum Buffer Size 1 MB   Specifies the maximum amount of data to buffer (per file) in order to apply the Grok expressions.
Files larger than the specified maximum will not be fully evaluated.
Named captures only Named captures only false
  • true
  • false
Only store named captures from grok
Keep Empty Captures Keep Empty Captures true
  • true
  • false
If true, then empty capture values will be included in the returned capture map.

Relationships:

NameDescription

unmatched FlowFiles are routed to this relationship when no provided Grok Expression matches the content of the FlowFile
matched FlowFiles are routed to this relationship when the Grok Expression is successfully evaluated and the FlowFile is modified as a result

Reads Attributes:

None specified.

Writes Attributes:

NameDescription

grok.XXX When operating in flowfile-attribute mode, each of the Grok identifier that is matched in the flowfile will be added as an attribute, prefixed with "grok." For example,if the grok identifier "timestamp" is matched, then the value will be added to an attribute named "grok.timestamp"

State management:

This component does not store state.

Restricted:

Required PermissionExplanation

reference remote resources Patterns can reference resources over HTTP

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.

 

 

 

 

 

 

 

UpdateRecord

Description:

Updates the contents of a FlowFile that contains Record-oriented data (i.e., data that can be read via a RecordReader and written by a RecordWriter). This Processor requires that at least one user-defined Property be added. The name of the Property should indicate a RecordPath that determines the field that should be updated. The value of the Property is either a replacement value (optionally making use of the Expression Language) or is itself a RecordPath that extracts a value from the Record. Whether the Property value is determined to be a RecordPath or a literal value depends on the configuration of the <Replacement Value Strategy> Property.

 

 

 

Tags:

update, record, generic, schema, json, csv, avro, log, logs, freeform, text

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Display NameAPI NameDefault ValueAllowable ValuesDescription

Record Reader record-reader   Controller Service API:
RecordReaderFactory
Implementations: CEFReader
SyslogReader
ReaderLookup
ProtobufReader
Syslog5424Reader
CSVReader
GrokReader
WindowsEventLogReader
ScriptedReader
AvroReader
ParquetReader
JsonPathReader
ExcelReader
JsonTreeReader
YamlTreeReader
XMLReader
Specifies the Controller Service to use for reading incoming data
Record Writer record-writer   Controller Service API:
RecordSetWriterFactory
Implementations: JsonRecordSetWriter
RecordSetWriterLookup
AvroRecordSetWriter
XMLRecordSetWriter
FreeFormTextRecordSetWriter
CSVRecordSetWriter
ParquetRecordSetWriter
ScriptedRecordSetWriter
Specifies the Controller Service to use for writing out the records
Replacement Value Strategy replacement-value-strategy Literal Value
  • Literal Value 
  • Record Path Value 
Specifies how to interpret the configured replacement values

Dynamic Properties:

Supports Sensitive Dynamic Properties: No

Dynamic Properties allow the user to specify both the name and value of a property.

NameValueDescription

A RecordPath. The value to use to replace fields in the record that match the RecordPath Allows users to specify values to use to replace fields in the record that match the RecordPath.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

Relationships:

NameDescription

success FlowFiles that are successfully transformed will be routed to this relationship
failure If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship

Reads Attributes:

None specified.

Writes Attributes:

NameDescription

record.index This attribute provides the current row index and is only available inside the literal value expression.
record.error.message This attribute provides on failure the error message encountered by the Reader or Writer.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.

See Also:

ConvertRecord

 

 

 

 

 

 

 

 

 

 

 

 

 

728x90
반응형
Comments