Dagger Quickstart
There are 2 ways to set up and get dagger running in your machine in no time -
- Docker Compose Setup - recommended for beginners
- Local Installation Setup - for more advanced usecases
#
Docker Compose Setup#
Prerequisites- You must have docker installed
Following are the steps for setting up dagger in docker compose -
Clone Dagger repository into your local
git clone https://github.com/goto/dagger.git
cd into the docker-compose directory:
cd dagger/quickstart/docker-compose
fire this command to spin up the docker compose:
docker compose up
This will spin up docker containers for the kafka, zookeeper, stencil, kafka-producer and the dagger.
fire this command to gracefully stop all the docker containers. This will save the container state and help to speed up the setup next time. All the kafka records and topics will also be saved :
docker compose stop
To start the containers from their saved state run this command
docker compose start
fire this command to gracefully remove all the containers. This will delete all the kafka topics/ saved data as well:
docker compose down
#
WorkflowFollowing are the containers that are created, in chronological order, when you run docker compose up
-
- Zookeeper - Container for the Zookeeper service is created and listening on port 2187. Zookeeper is a service required by the Kafka server.
- Kafka - Container for Kafka server is created and is exposed on port 29094. This will serve as the input data source for the Dagger.
- init-kafka - This container creates the kafka topic
dagger-test-topic-v1
from which the dagger will pull the Kafka messages. - Stencil - It compiles the proto file and creates a proto descriptor. Also it sets up an http server serving the proto descriptors required by dagger to parse the Kafka messages.
- kafka-producer - It runs a script to generate the random kafka messages and sends one message to the kafka topic every second.
- Dagger - Clones the Dagger Github repository and builds the jar. Then it creates an in-memory flink cluster and uploads the dagger job jar and starts the job.
The dagger environment variables are present in the local.properties
file inside the quickstart/docker-compose/resources
directory. The dagger runs a simple aggregation query which will count the number of bookings , i.e. kafka messages, in every 30 seconds interval. The output will be visible in the logs in the terminal itself. You can edit this query (FLINK_SQL_QUERY
variable) in the local.properties
file inside the quickstart/docker-compose/resources
directory.
#
Local Installation Setup#
Prerequisites- Your Java version is Java 8: Dagger as of now works only with Java 8. Some features might not work with older or later versions.
- Your Kafka version is 3.0.0 or a minor version of it
- You have kcat installed: We will use kcat to push messages to Kafka from the CLI. You can follow the installation steps here. Ensure the version you install is 1.7.0 or a minor version of it.
- You have protobuf installed: We will use protobuf to push messages encoded in protobuf format to Kafka topic. You can follow the installation steps for MacOS here. For other OS, please download the corresponding release from here. Please note, this quickstart has been written to work with 3.17.3 of protobuf. Compatibility with other versions is unknown.
- You have Python 2.7+ and simple-http-server installed: We will use Python along with simple-http-server to spin up a mock Stencil server which can serve the proto descriptors to Dagger. To install simple-http-server, please follow these installation steps.
#
Quickstart- Clone Dagger repository into your local
git clone https://github.com/goto/dagger.git
- Next, we will generate our proto descriptor set. Ensure you are at the top level directory(
dagger
) and then fire this command
./gradlew clean dagger-common:generateTestProto
This command will generate a descriptor set containing the proto descriptors of all the proto files present under dagger-common/src/test/proto
. After running this, you should see a binary file called dagger-descriptors.bin
under dagger-common/src/generated-sources/descriptors/
.
- Next, we will setup a mock Stencil server to serve this proto descriptor set to Dagger. Open up a new tab in your terminal and
cd
into this directory:dagger-common/src/generated-sources/descriptors
. Then fire this command:
python -m SimpleHTTPServer 8000
This will spin up a mock HTTP server and serve the descriptor set we just generated in the previous step at port 8000.
The Stencil client being used in Dagger will fetch it by calling this URL. This has been already configured in local.properties
, as we have set SCHEMA_REGISTRY_STENCIL_ENABLE
to true and pointed SCHEMA_REGISTRY_STENCIL_URLS
to http://127.0.0.1:8000/dagger-descriptors.bin
.
Next, we will generate and send some messages to a sample kafka topic as per some proto schema. Note that, in
local.properties
we have setINPUT_SCHEMA_PROTO_CLASS
underSTREAMS
to usecom.gotocompany.dagger.consumer.TestPrimitiveMessage
proto. Hence, we will push messages which conform to this schema into the topic. For doing this, please follow these steps:cd
into the directorydagger-common/src/test/proto
. You should see a text filesample_message.txt
which contains just one message. We will encode it into a binary in protobuf format.- Fire this command:
protoc --proto_path=./ --encode=com.gotocompany.dagger.consumer.TestPrimitiveMessage ./TestLogMessage.proto < ./sample_message.txt > out.bin
This will generate a binary file called
out.bin
. It contains the binary encoded message ofsample_message.txt
.- Next, we will push this encoded message to the source Kafka topic as mentioned under
SOURCE_KAFKA_TOPIC_NAMES
insideSTREAMS
insidelocal.properties
. Ensure Kafka is running atlocalhost:9092
and then, fire this command:
kcat -P -b localhost:9092 -D "\n" -T -t dagger-test-topic-v1 out.bin
You can also fire this command multiple times, if you want multiple messages to be sent into the topic. Just make sure you increment the
event_timestamp
value every time insidesample_message.txt
and then repeat the above steps.cd
into the repository root again (dagger
) and start Dagger by running the following command:
./gradlew dagger-core:runFlink
After some initialization logs, you should see the output of the SQL query getting printed.
#
TroubleshootingI am pushing messages to the kafka topic but not seeing any output in the logs.
This can happen for the following reasons:
a. Pushed messages are not reaching the right topic: Check for any exceptions or errors when pushing messages to the Kafka topic. Ensure that the topic to which you are pushing messages is the same one for which you have configured Dagger to read from under
STREAMS
->SOURCE_KAFKA_TOPIC_NAMES
inlocal.properties
b. The consumer group is not updated: Dagger might have already processed those messages. If you have made any changes to the setup, make sure you update the
STREAMS
->SOURCE_KAFKA_CONSUMER_CONFIG_GROUP_ID
variable inlocal.properties
to some new value.I see an exception
java.lang.RuntimeException: Unable to retrieve any partitions with KafkaTopicsDescriptor: Topic Regex Pattern
This can happen if the topic configured under
STREAMS
->SOURCE_KAFKA_TOPIC_NAMES
inlocal.properties
is new and you have not pushed any messages to it yet. Ensure that you have pushed atleast one message to the topic before you start dagger.