Skip to main content

Step-by-Step Guide to Notebook Process

The Notebook Process node in Syntasa App Studio enables you to take a Jupyter Notebook developed in a Workspace and run it as a production-grade, scheduled component within a data pipeline. This article provides a professional, end-to-end walkthrough for configuring, parameterizing, executing, and monitoring a Notebook Process.


Prerequisites

Before you begin, ensure that the following are in place:

  • A Notebook Workspace has been created.
  • A Jupyter Notebook file (.ipynb) exists in that workspace.
  • A Syntasa App has been created in App Studio.

Add the Notebook Process Node

  1. Open your App in App Studio.
  2. From the left-hand palette, locate the Notebook icon under the Processes section.
  3. Drag and drop the Notebook icon onto the canvas.
  4. Click the node to open the Configuration Sidebar.

Select and Synchronize the Notebook

This step links the process node to your notebook source code and workspace context.

Workspace Selection

  • Select the Workspace that contains your notebook.
  • This determines both the storage context and the runtime file system used during execution.

Notebook Name

  • Browse the workspace directory and select the target Notebook.

Launch Button

  • Opens the selected notebook in a new JupyterLab tab.
  • Useful for quick edits or verification without leaving App Studio.

Refresh Button

  • Re-scans the notebook file.
  • Updates metadata and parameter definitions in the UI after any notebook changes.

Configure Parameters (Parameters Preview)

Syntasa uses Papermill-style parameterization to inject runtime values into notebooks.

In JupyterLab

  • Add a cell tagged exactly as parameters (lowercase).
  • Define default values, for example:
batch_id = "default"
process_date = "1970-01-01"

In App Studio

  • Navigate to the Parameters section of the configuration sidebar.
  • The system displays a Preview of all variables detected in the tagged cell.

Override Values

  • Map notebook parameters to App-level or system variables.
  • Example:
process_date = {{process_date}}

Troubleshooting

  • If parameters do not appear:
    • Verify the cell tag is parameters (lowercase).
    • Click Refresh in the configuration panel.

Runtime and Infrastructure Settings

Define the execution environment for the notebook.

Runtime

  • Choose the appropriate Python or Spark runtime.
  • Ensure it matches the libraries and framework versions required by your code.

Compute Profile

  • Configure CPU and memory allocation.
  • For Spark workloads, ensure adequate Driver and Executor memory based on data volume.

Environment Variables

  • Add key-value pairs such as:
    • LOG_LEVEL
    • API_ENDPOINT
    • ENV

These variables are injected into the runtime environment during execution.


Define Outputs

A Notebook Process produces two primary output types.

Data Outputs

  • If your notebook writes data to a table or file path, register that location as an Output Dataset.
  • This enables downstream nodes to detect data readiness and establish dependencies.

Executed Notebook Artifact

  • By default, Syntasa saves a copy of the notebook after execution, including:
    • Cell outputs
    • Logs
    • Charts and visualizations
  • Configure the storage path for this artifact in the Output tab.

Scheduling and Triggering Execution

Notebook Processes participate fully in App orchestration.

Connect Dependencies

  • Draw a connector from upstream nodes (for example, Crawlers or Event Stores) to the Notebook Process.
  • The notebook will execute only after upstream dependencies complete successfully.

Configure Schedule

  • Open App Settings.
  • Define triggers such as:
    • Hourly
    • Daily
    • Weekly
    • Event-based

Deploy

  • Click Build and Deploy to activate the pipeline.

Monitoring and Troubleshooting

App Monitor

  • View real-time execution status in the Monitor tab.

Logs

  • Click the Notebook Process node in Monitor view to access:
    • Python tracebacks
    • Spark driver and executor logs

Executed Notebook Review

  • Download or open the executed notebook to inspect:
    • Failed cells
    • Intermediate outputs
    • Runtime parameters

This is often the fastest way to identify logic or data issues.


Summary of Key UI Actions

ActionPurpose
LaunchOpens JupyterLab to edit the source notebook.
RefreshSynchronizes App Studio with the latest notebook changes and parameters.
PreviewDisplays variables detected in the parameters cell.
Output PathDefines where the executed notebook snapshot is stored.

Conclusion

The Notebook Process node bridges interactive data science development and production-grade orchestration. By carefully configuring workspace linkage, parameters, runtime resources, outputs, and schedules, teams can reliably operationalize notebooks as scalable and auditable components of enterprise data pipelines.