Problem
When I submit a job to a Databricks cluster from the Sparkflows UI, the job runs but I don’t see any response or status update back in the UI. How can I troubleshoot this issue?
Solution
Sparkflows relies on postback events from Databricks to update job status in the UI. If Sparkflows does not receive these callbacks, the UI will not show any response. This usually points to connectivity or configuration issues between Databricks and Sparkflows.
Possible Causes
- The Postback URL configured in Sparkflows is incorrect.
- The Postback URL is not accessible from the Databricks cluster due to network or DNS restrictions.
Resolution Steps
Jobs submitted from Sparkflows to Databricks send execution events back to Sparkflows. Therefore, the Sparkflows web server endpoint must be reachable from the Databricks cluster.
Step 1: Test Sparkflows Health Check from Databricks Notebook
Run the following command in a Databricks notebook to verify if the Sparkflows REST endpoint is accessible:
%sh curl --location --request GET 'http://sparkflows_host:8080/healthcheck'
Note: Replace sparkflows_host with the domain name or IP address where Sparkflows is running.
If this request succeeds, Databricks can reach Sparkflows and postback events should work.
Problem
When running the curl command from a Databricks notebook using HTTPS, the following error appears:
Could not resolve host: sparkflows.com
Solution
Possible Causes
- The domain name cannot be resolved to an IP address (DNS issue).
- The domain name is misspelled or the URL is incorrect.
- A network connectivity issue exists between Databricks and the Sparkflows server.
- There is an SSL/TLS certificate issue on the Sparkflows server.
Resolution Steps
Step 1: Verify URL and DNS
- Ensure the domain name is spelled correctly.
- Confirm that the correct protocol (
httporhttps) is being used.
Step 2: Check Network Connectivity
- Verify that the Sparkflows server is reachable from the Databricks VNet.
- Try running the command again after confirming firewall and routing rules.
Step 3: Manually Resolve the Host (Optional)
If DNS resolution is failing, you can explicitly specify the IP address:
curl --resolve sparkflows.com:443:<IP_ADDRESS> https://sparkflows.com/healthcheck
Step 4: Disable SSL Validation (For Troubleshooting Only)
curl --insecure https://sparkflows.com/healthcheck
Note: This option is not recommended for production but can help identify certificate-related issues during troubleshooting.