Skip to main content

Development Guide

The following guide will help you quickly run Firehose in your local machine. The main components of Firehose are:

  • Consumer: Handles data consumption from Kafka.
  • Sink: Package which handles sinking data.
  • Metrics: Handles the metrics via StatsD client

Requirements

Development environment

Java SE Development Kit 8 is required to build, test and run Firehose service. Oracle JDK 8 can be downloaded from here. Extract the tarball to your preferred installation directory and configure your PATH environment variable to point to the bin sub-directory in the JDK 8 installation directory. For example -

export PATH=~/Downloads/jdk1.8.0_291/bin:$PATH

Environment Variables

Firehose environment variables can be configured in either of the following ways -

  • append a new line at the end of env/local.properties file. Variables declared in local.properties file are automatically added to the environment during runtime.
  • run export SAMPLE_VARIABLE=287 on a UNIX shell, to directly assign the required environment variable.

Kafka Server

Apache Kafka server service must be set up, from which Firehose's Kafka consumer will pull messages. Kafka Server version greater than 2.4 is currently supported by Firehose. Kafka Server URL and port address, as well as other Kafka-specific parameters must be configured in the corresponding environment variables as defined in the Generic configuration section.

Read the official guide on how to install and configure Apache Kafka Server.

Destination Sink Server

The sink to which Firehose will stream Kafka's data to, must have its corresponding server set up and configured. The URL and port address of the database server / HTTP/GRPC endpoint , along with other sink - specific parameters must be configured the environment variables corresponding to that particular sink.

Configuration parameter variables of each sink can be found in the Configurations section.

Schema Registry

When INPUT_SCHEMA_DATA_TYPE is set to protobuf, firehose uses Stencil Server as its Schema Registry for hosting Protobuf descriptors. The environment variable SCHEMA_REGISTRY_STENCIL_ENABLE must be set to true . Stencil server URL must be specified in the variable SCHEMA_REGISTRY_STENCIL_URLS . The Proto Descriptor Set file of the Kafka messages must be uploaded to the Stencil server.

Refer this guide on how to set up and configure the Stencil server, and how to generate and upload Proto descriptor set file to the server.

Monitoring

Firehose sends critical metrics via StatsD client. Refer the Monitoring section for details on how to setup Firehose with Grafana. Alternatively, you can set up any other visualization platform for monitoring Firehose. Following are the typical requirements -

  • StatsD host (e.g. Telegraf) for aggregation of metrics from Firehose StatsD client
  • A time-series database (e.g. InfluxDB) to store the metrics
  • GUI visualization dashboard (e.g. Grafana) for detailed visualisation of metrics

Running locally

  • The following guides provide a simple way to run firehose with a log sink locally.
  • It uses the TestMessage (src/test/proto/TestMessage.proto) proto schema, which has already been provided for testing purposes.
# Clone the repo
$ git clone https://github.com/goto/firehose.git

# Build the jar
$ ./gradlew clean build

# Configure env variables
$ cat env/local.properties

Configure env/local.properties

Set the generic variables in the local.properties file.

KAFKA_RECORD_PARSER_MODE = message
SINK_TYPE = log
INPUT_SCHEMA_DATA_TYPE=protobuf
INPUT_SCHEMA_PROTO_CLASS = com.gotocompany.firehose.consumer.TestMessage

Set the variables which specify the kafka server, topic name, and group-id of the kafka consumer - the standard values are used here.

SOURCE_KAFKA_BROKERS = localhost:9092
SOURCE_KAFKA_TOPIC = test-topic
SOURCE_KAFKA_CONSUMER_GROUP_ID = sample-group-id

Stencil Workaround

Firehose uses Stencil as the schema-registry which enables dynamic proto schemas. For the sake of this quick-setup guide, we can work our way around Stencil setup by setting up a simple local HTTP server which can provide the static descriptor for TestMessage schema.

  • Install a server service - like this one.

  • Generate the descriptor for TestMessage by running the command on terminal -

./gradlew generateTestProto
  • The above should generate a file (src/test/resources/__files/descriptors.bin), move this to a new folder at a separate location, and start the HTTP-server there so that this file can be fetched at the runtime.
  • If you are using this, use this command after moving the file to start server at the default port number 8080.
http-server
  • Because we are not using the schema-registry in the default mode, the following lines should also be added in env/local.properties to specify the new location to fetch descriptor from.
SCHEMA_REGISTRY_STENCIL_ENABLE = true
SCHEMA_REGISTRY_STENCIL_URLS = http://localhost:8080/descriptors.bin
SCHEMA_REGISTRY_STENCIL_CACHE_AUTO_REFRESH = false
SCHEMA_REGISTRY_STENCIL_REFRESH_STRATEGY = LONG_POLLING

Run Firehose Log Sink

  • Make sure that your kafka server and local HTTP server containing the descriptor is up and running.
  • Run the firehose consumer through the gradlew task:
./gradlew runConsumer

Note: Sample configuration for other sinks along with some advanced configurations can be found here

Running tests

# Running unit tests
$ ./gradlew test

# Run code quality checks
$ ./gradlew checkstyleMain checkstyleTest

#Cleaning the build
$ ./gradlew clean

Style Guide

Java

We conform to the Google Java Style Guide. Maven can helpfully take care of that for you before you commit:

Making a pull request

Incorporating upstream changes from master

Our preference is the use of git rebase instead of git merge. Signing commits

# Include -s flag to signoff
$ git commit -s -m "feat: my first commit"

Good practices to keep in mind

  • Follow the conventional commit format for all commit messages.
  • Fill in the description based on the default template configured when you first open the PR
  • Include kind label when opening the PR
  • Add WIP: to PR name if more work needs to be done prior to review
  • Avoid force-pushing as it makes reviewing difficult