Skip to main content

ADD UDF

Want a function to use in SQL which is not supported both by Flink and one of the pre-supported udfs? You can simply write your User-Defined function and contribute to the dagger. Read more on how to use UDFs here.

Note: Please go through the Contribution guide to know about all the conventions and practices we tend to follow and to know about the contribution process to the dagger.

For adding custom UDFs follow these steps:

  • Ensure none of the built-in functions or existing UDF suits your requirement.

  • For adding a UDF, figure out which type of UDF you required. Flink supports three types of User defined function. Choose one of them according to the requirement.

  • There are options for programming language you can choose for adding UDF, which is using Java, Scala and Python.

  • For adding UDF with Java/Scala:

    • Follow this for more insights on writing your UDF.
    • UDF needs to be added in the function-type folder inside this on dagger-functions subproject.
    • Extend either of ScalarUdf, TableUdf or AggregateUdf from dagger-common. They are boilerplate contracts extending Flink UDF classes. These classes do some more preprocessing(like exposing some metrics) in the open method behind the scene.
    • Register the UDF in this class. This is required to let Flink know about your function.
    • If you have some business-specific use-cases and you don't want to add UDFs to the open-sourced repo, you can have a separate local codebase for those UDFs. Those UDFs need to be registered in a similar class like the UDFFactory. Keep both the UDF classes and the factory class in the classpath of Dagger. Configure the fully qualified Factory class in the FUNCTION_FACTORY_CLASSES parameter and you will be able to use the desired UDF in your query.
  • For adding UDF with Python:

    • Follow this for more insights on writing your UDF.
    • UDF need to be added inside this on dagger-py-functions directory.
    • Ensure that the filename and method name on the python functions is the same. This name will be registered by dagger as a function name which later can be used on the query.
    • Ensure to add dependency needed for the python function on the requirements.txt file.
    • Add python unit test and the make sure the test is passed.
    • If you have some business-specific use-cases and you don't want to add UDFs to the open-sourced repo, you can have a separate local codebase for those UDFs and specify that file on the python configuration.
  • Bump up the version and raise a PR for the same. Also please add the registered function to the list of udfs doc.

In the subsequent release of the dagger, your functions should be useable in the query.