Removing duplicate records using Transformers
#
About this exampleIn this example, we will use the DeDuplication Transformer in Dagger to remove the booking orders (as Kafka records) having duplicate order_number
. By the end of this example we will understand how to use Dagger to remove duplicate data from Kafka source.
#
Before Trying This ExampleWe must have Docker installed. We can follow this guide on how to install and set up Docker in your local machine.
Clone Dagger repository into your local
git clone https://github.com/goto/dagger.git
#
StepsFollowing are the steps for setting up dagger in docker compose -
- cd into the aggregation directory:
cd dagger/quickstart/examples/aggregation/tumble_window
- fire this command to spin up the docker compose:Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the booking logs without any duplicate
docker compose up
order_number
. - fire this command to gracefully close the docker compose:This will stop and remove all the containers.
docker compose down
Congratulations, we are now able to use Dagger to remove duplicate data from Kafka source!