Kafka is a high throughput distributed messaging system. In this exercise, we’ll implement a simple Kafka service to read a feed from Twitter and write it to Elastic Search, using Python.
- Kafka Server installed locally.
- Twitter developer account.
- Bonsai Elastic Search (or local installation of Elastic Search)
- Download and extract:
$ tar -xzf kafka_2.13-2.7.0.tgz
$ cd kafka_2.13-2.7.0
2. Go into the extracted folder and launch Zookeeper and Kafka Server:
$ bin/zookeeper-server-start.sh config/zookeeper.properties
$ bin/kafka-server-start.sh config/server.properties
3. Create a topic, which is simply a particular stream of data. In this example, the topic is “bitcoin”
$ bin/kafka-topics.sh --create --topic bitcoin --bootstrap-server localhost:9092
Create the Producer.
Producer is the component that will write to the Kafka topic. Producers can be website events, pricing data, financial transactions, user interactions. etc, and come from a wide range of devices or systems. In this exercise, our producer will obtain its data from Twitter.
$ pip install kafka-python
$ pip install tweepy
Create the Consumer.
The consumer “subscribes” to the topic. It will read the messages that the producer sends. As new messages arrive, they will be sent to our Elastic Search cluster.
For this part, we’ll need the Python Elastic Search client:
$ pip install elasticsearch
$ pip install elasticsearch-dsl
Kafka uses the concept of Producers and Consumers to write and read data streams. In this exercise, we are using Kafka to read a Twitter feed and post it to Elastic Search.