Step-by-Step Guide to Notebook Process

The Notebook Process node in Syntasa App Studio enables you to take a Jupyter Notebook developed in a Workspace and run it as a production-grade, scheduled component within a data pipeline. This article provides a professional, end-to-end walkthrough for configuring, parameterizing, executing, and monitoring a Notebook Process.

Prerequisites

Before you begin, ensure that the following are in place:

A Notebook Workspace has been created.
A Jupyter Notebook file (.ipynb) exists in that workspace.
A Syntasa App has been created in App Studio.

Add the Notebook Process Node

Open your App in App Studio.
From the left-hand palette, locate the Notebook icon under the Processes section.
Drag and drop the Notebook icon onto the canvas.
Click the node to open the Configuration Sidebar.

Select and Synchronize the Notebook

This step links the process node to your notebook source code and workspace context.

Workspace Selection

Select the Workspace that contains your notebook.
This determines both the storage context and the runtime file system used during execution.

Notebook Name

Browse the workspace directory and select the target Notebook.

Launch Button

Opens the selected notebook in a new JupyterLab tab.
Useful for quick edits or verification without leaving App Studio.

Refresh Button

Re-scans the notebook file.
Updates metadata and parameter definitions in the UI after any notebook changes.

Configure Parameters (Parameters Preview)

Syntasa uses Papermill-style parameterization to inject runtime values into notebooks.

In JupyterLab

Add a cell tagged exactly as parameters (lowercase).
Define default values, for example:

batch_id = "default"
process_date = "1970-01-01"

In App Studio

Navigate to the Parameters section of the configuration sidebar.
The system displays a Preview of all variables detected in the tagged cell.

Override Values

Map notebook parameters to App-level or system variables.
Example:

process_date = {{process_date}}

Troubleshooting

If parameters do not appear:
- Verify the cell tag is parameters (lowercase).
- Click Refresh in the configuration panel.

Runtime and Infrastructure Settings

Define the execution environment for the notebook.

Runtime

Choose the appropriate Python or Spark runtime.
Ensure it matches the libraries and framework versions required by your code.

Compute Profile

Configure CPU and memory allocation.
For Spark workloads, ensure adequate Driver and Executor memory based on data volume.

Environment Variables

Add key-value pairs such as:
- LOG_LEVEL
- API_ENDPOINT
- ENV

These variables are injected into the runtime environment during execution.

Define Outputs

A Notebook Process produces two primary output types.

Data Outputs

If your notebook writes data to a table or file path, register that location as an Output Dataset.
This enables downstream nodes to detect data readiness and establish dependencies.

Executed Notebook Artifact

By default, Syntasa saves a copy of the notebook after execution, including:
- Cell outputs
- Logs
- Charts and visualizations
Configure the storage path for this artifact in the Output tab.

Scheduling and Triggering Execution

Notebook Processes participate fully in App orchestration.

Connect Dependencies

Draw a connector from upstream nodes (for example, Crawlers or Event Stores) to the Notebook Process.
The notebook will execute only after upstream dependencies complete successfully.

Configure Schedule

Open App Settings.
Define triggers such as:
- Hourly
- Daily
- Weekly
- Event-based

Deploy

Click Build and Deploy to activate the pipeline.

Monitoring and Troubleshooting

App Monitor

View real-time execution status in the Monitor tab.

Logs

Click the Notebook Process node in Monitor view to access:
- Python tracebacks
- Spark driver and executor logs

Executed Notebook Review

Download or open the executed notebook to inspect:
- Failed cells
- Intermediate outputs
- Runtime parameters

This is often the fastest way to identify logic or data issues.

Summary of Key UI Actions

Action	Purpose
Launch	Opens JupyterLab to edit the source notebook.
Refresh	Synchronizes App Studio with the latest notebook changes and parameters.
Preview	Displays variables detected in the `parameters` cell.
Output Path	Defines where the executed notebook snapshot is stored.

Conclusion

The Notebook Process node bridges interactive data science development and production-grade orchestration. By carefully configuring workspace linkage, parameters, runtime resources, outputs, and schedules, teams can reliably operationalize notebooks as scalable and auditable components of enterprise data pipelines.

Prerequisites

Add the Notebook Process Node

Select and Synchronize the Notebook

Workspace Selection​

Notebook Name​

Launch Button​

Refresh Button​

Configure Parameters (Parameters Preview)

In JupyterLab​

In App Studio​

Override Values​

Troubleshooting​

Runtime and Infrastructure Settings

Runtime​

Compute Profile​

Environment Variables​