UDF
Sometimes there could be a case when available spark transformation operations are not sufficient to fulfill a requirement, and you need some complex logic to be executed on data to get the desired value. So here we are with the UDFs that allow a user to write their own custom logic code that will be executed in Pipeline on input data. LINC CONNECT platform gives you the power to write UDFs in a variety of programming languages such as Python, Nodejs, JavaScript, and many more to come.
Let's understand this with the help of an example, converting Temperature from Celsius to Fahrenheit using Python UDF. To create a UDF, go to BigData > UDF and click on Add New.

Enter the Basic Information First - Name (name should be same as Python function name), Description, and select Interpreter (programming language) which is Python in our case.
Now create a function in the code editor with the same name as in Basic Information, and write your transformation logic in the function body.

To check the validity of this function click on Debug button, it will ensure whether the function is having any syntax errors or not.

Also, you can check the execution of your function whether it is producing the correct result or not in the following way (add a print statement at the end and call the function with some input value and remove the print statement before saving the UDF).

Once you are satisfied with the UDF results then it's time to save the UDF and use it in Pipeline.
Let's use this UDF in the pipeline created in the previous section (Pipeline Section).
To use this UDF in the pipeline first stop and edit the pipeline from actions available against it on the pipeline page.
Add the UDF operation in the Select Definition of the pipeline as shown in the below screenshot.

Now Debug the pipeline and check for the output of this UDF. See the last column named Fahrenheit containing the temperature in Fahrenheit.

The syntax for using UDFs in the pipeline:
Python:
execute_python ('< python_function_name >', < function_param >) as < output_column_name >
JavaScript:
execute_js ('< js_function_name >', < function_param >) as < output_column_name >
Nodejs:
execute_node ('< nodejs_function_name >', < function_param >) as < output_column_name >
Note: UDFs can accept only one parameter. If you want to pass more than one parameter then pass a nested object (struct OR json object).
Last updated