【Samza系列】实时计算Samza中文教程(四)—API概述
2020-12-13 14:21
标签:samza samzaapi 实时计算 【Samza系列】实时计算Samza中文教程(四)—API概述 标签:samza samzaapi 实时计算 原文地址:http://blog.csdn.net/yangchao228/article/details/40618359
当你运行你的job时,Samza将为你的class创建一些实例(可能在多台机器上)。这些任务实例会处理输入流里的消息。
public class MyTaskClass implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
// process message
}
}
# This is the class above, which Samza will instantiate when the job is run
task.class=com.example.samza.MyTaskClass
# Define a system called "kafka" (you can give it any name, and you can define
# multiple systems if you want to process messages from different sources)
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
# The job consumes a topic called "PageViewEvent" from the "kafka" system
task.inputs=kafka.PageViewEvent
# Define a serializer/deserializer called "json" which parses JSON messages
serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
# Use the "json" serializer for messages in the "PageViewEvent" topic
systems.kafka.streams.PageViewEvent.samza.msg.serde=json
/** Every message that is delivered to a StreamTask is wrapped
* in an IncomingMessageEnvelope, which contains metadata about
* the origin of the message. */
public class IncomingMessageEnvelope {
/** A deserialized message. */
Object getMessage() { ... }
/** A deserialized key. */
Object getKey() { ... }
/** The stream and partition that this message came from. */
SystemStreamPartition getSystemStreamPartition() { ... }
}
/** A triple of system name, stream name and partition. */
public class SystemStreamPartition extends SystemStream {
/** The name of the system which provides this stream. It is
defined in the Samza job‘s configuration. */
public String getSystem() { ... }
/** The name of the stream/topic/queue within the system. */
public String getStream() { ... }
/** The partition within the stream. */
public Partition getPartition() { ... }
}
/** When a task wishes to send a message, it uses this interface. */
public interface MessageCollector {
void send(OutgoingMessageEnvelope envelope);
}
public class SplitStringIntoWords implements StreamTask {
// Send outgoing messages to a stream called "words"
// in the "kafka" system.
private final SystemStream OUTPUT_STREAM =
new SystemStream("kafka", "words");
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
String message = (String) envelope.getMessage();
for (String word : message.split(" ")) {
// Use the word as the key, and 1 as the value.
// A second task can add the 1‘s to get the word count.
collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, word, 1));
}
}
}
上一篇:Python3 函数基础2
下一篇:快速排序
文章标题:【Samza系列】实时计算Samza中文教程(四)—API概述
文章链接:http://soscw.com/essay/33999.html