Dagger Quickstart
There are 2 ways to set up and get dagger running in your machine in no time -
- Docker Compose Setup - recommended for beginners
- Local Installation Setup - for more advanced usecases
Docker Compose Setup#
Prerequisites#
- You must have docker installed
Following are the steps for setting up dagger in docker compose -
Clone Dagger repository into your local
git clone https://github.com/goto/dagger.gitcd into the docker-compose directory:
cd dagger/quickstart/docker-composefire this command to spin up the docker compose:
docker compose upThis will spin up docker containers for the kafka, zookeeper, stencil, kafka-producer and the dagger.
fire this command to gracefully stop all the docker containers. This will save the container state and help to speed up the setup next time. All the kafka records and topics will also be saved :
docker compose stopTo start the containers from their saved state run this command
docker compose startfire this command to gracefully remove all the containers. This will delete all the kafka topics/ saved data as well:
docker compose down
Workflow#
Following are the containers that are created, in chronological order, when you run docker compose up -
- Zookeeper - Container for the Zookeeper service is created and listening on port 2187. Zookeeper is a service required by the Kafka server.
- Kafka - Container for Kafka server is created and is exposed on port 29094. This will serve as the input data source for the Dagger.
- init-kafka - This container creates the kafka topic
dagger-test-topic-v1from which the dagger will pull the Kafka messages. - Stencil - It compiles the proto file and creates a proto descriptor. Also it sets up an http server serving the proto descriptors required by dagger to parse the Kafka messages.
- kafka-producer - It runs a script to generate the random kafka messages and sends one message to the kafka topic every second.
- Dagger - Clones the Dagger Github repository and builds the jar. Then it creates an in-memory flink cluster and uploads the dagger job jar and starts the job.
The dagger environment variables are present in the local.properties file inside the quickstart/docker-compose/resources directory. The dagger runs a simple aggregation query which will count the number of bookings , i.e. kafka messages, in every 30 seconds interval. The output will be visible in the logs in the terminal itself. You can edit this query (FLINK_SQL_QUERY variable) in the local.properties file inside the quickstart/docker-compose/resources directory.
Local Installation Setup#
Prerequisites#
- Your Java version is Java 8: Dagger as of now works only with Java 8. Some features might not work with older or later versions.
- Your Kafka version is 3.0.0 or a minor version of it
- You have kcat installed: We will use kcat to push messages to Kafka from the CLI. You can follow the installation steps here. Ensure the version you install is 1.7.0 or a minor version of it.
- You have protobuf installed: We will use protobuf to push messages encoded in protobuf format to Kafka topic. You can follow the installation steps for MacOS here. For other OS, please download the corresponding release from here. Please note, this quickstart has been written to work with 3.17.3 of protobuf. Compatibility with other versions is unknown.
- You have Python 2.7+ and simple-http-server installed: We will use Python along with simple-http-server to spin up a mock Stencil server which can serve the proto descriptors to Dagger. To install simple-http-server, please follow these installation steps.
Quickstart#
- Clone Dagger repository into your local
git clone https://github.com/goto/dagger.git- Next, we will generate our proto descriptor set. Ensure you are at the top level directory(
dagger) and then fire this command
./gradlew clean dagger-common:generateTestProtoThis command will generate a descriptor set containing the proto descriptors of all the proto files present under dagger-common/src/test/proto. After running this, you should see a binary file called dagger-descriptors.bin under dagger-common/src/generated-sources/descriptors/.
- Next, we will setup a mock Stencil server to serve this proto descriptor set to Dagger. Open up a new tab in your terminal and
cdinto this directory:dagger-common/src/generated-sources/descriptors. Then fire this command:
python -m SimpleHTTPServer 8000This will spin up a mock HTTP server and serve the descriptor set we just generated in the previous step at port 8000.
The Stencil client being used in Dagger will fetch it by calling this URL. This has been already configured in local.properties, as we have set SCHEMA_REGISTRY_STENCIL_ENABLE to true and pointed SCHEMA_REGISTRY_STENCIL_URLS to http://127.0.0.1:8000/dagger-descriptors.bin.
Next, we will generate and send some messages to a sample kafka topic as per some proto schema. Note that, in
local.propertieswe have setINPUT_SCHEMA_PROTO_CLASSunderSTREAMSto usecom.gotocompany.dagger.consumer.TestPrimitiveMessageproto. Hence, we will push messages which conform to this schema into the topic. For doing this, please follow these steps:cdinto the directorydagger-common/src/test/proto. You should see a text filesample_message.txtwhich contains just one message. We will encode it into a binary in protobuf format.- Fire this command:
protoc --proto_path=./ --encode=com.gotocompany.dagger.consumer.TestPrimitiveMessage ./TestLogMessage.proto < ./sample_message.txt > out.binThis will generate a binary file called
out.bin. It contains the binary encoded message ofsample_message.txt.- Next, we will push this encoded message to the source Kafka topic as mentioned under
SOURCE_KAFKA_TOPIC_NAMESinsideSTREAMSinsidelocal.properties. Ensure Kafka is running atlocalhost:9092and then, fire this command:
kcat -P -b localhost:9092 -D "\n" -T -t dagger-test-topic-v1 out.binYou can also fire this command multiple times, if you want multiple messages to be sent into the topic. Just make sure you increment the
event_timestampvalue every time insidesample_message.txtand then repeat the above steps.cdinto the repository root again (dagger) and start Dagger by running the following command:
./gradlew dagger-core:runFlinkAfter some initialization logs, you should see the output of the SQL query getting printed.
Troubleshooting#
I am pushing messages to the kafka topic but not seeing any output in the logs.
This can happen for the following reasons:
a. Pushed messages are not reaching the right topic: Check for any exceptions or errors when pushing messages to the Kafka topic. Ensure that the topic to which you are pushing messages is the same one for which you have configured Dagger to read from under
STREAMS->SOURCE_KAFKA_TOPIC_NAMESinlocal.propertiesb. The consumer group is not updated: Dagger might have already processed those messages. If you have made any changes to the setup, make sure you update the
STREAMS->SOURCE_KAFKA_CONSUMER_CONFIG_GROUP_IDvariable inlocal.propertiesto some new value.I see an exception
java.lang.RuntimeException: Unable to retrieve any partitions with KafkaTopicsDescriptor: Topic Regex PatternThis can happen if the topic configured under
STREAMS->SOURCE_KAFKA_TOPIC_NAMESinlocal.propertiesis new and you have not pushed any messages to it yet. Ensure that you have pushed atleast one message to the topic before you start dagger.