How to schedule a job in data bricks
Hello everyone, in this blog we will see how we can schedule a job for our notebook to run at specific intervals.
Step 1: Launch your databricks workspace and go to Jobs.
Step 2: Click on create jobs you will find the following window.
The task can be anything of your choice. Select your notebook that you want to run on schedule. I have written my script in a notebook so I will select the type as a notebook. Navigate to your notebook and hit confirm.
Step 3:
Now it’s time to select the cluster. You have two options, either you can use your existing cluster or you can create a job cluster. Please take note that adding a new cluster will increase the cost of your system.
If you are selecting a new job cluster make sure you configure it to your requirement. Do not use a high configuration cluster if it is not needed as the cost will increase for high configuration. To edit the configuration click on the pencil icon.
Step 4:
For this integration, my workload is not much so I am selecting a single node cluster with 14 GB ram.
You can also get logs from your cluster by enabling the logs in advance options. Hit confirm once it is done
Step 5:
After selecting cluster you can also explore the advance option of job, where you can find retry options, time out, and external options
Retry:
Time out:
Dependent library:
Step 6:
Now our job is ready it’s time to schedule the job. Click on the schedule button present on the right side.
You can see we can still edit upgrade our configuration of clusters so don’t worry even if you select low configuration at the start.
Step 7:
You can schedule the run according to your choice, here I have scheduled to run it every 3 mins.
Step 8:
You can see your runs here
Step 9:
In the view details section, you can see the output of your code that is executed.
In this way, you can schedule a job in databricks. Hope this helps!