Specialized App Base Classes¶

Beyond the bare-minimum ClamsApp introduced in Getting Started, the SDK provides specialized base classes that capture common structural patterns for CLAMS apps. Each specialized base class extends ClamsApp with a standardized runtime parameter surface and helper methods appropriate to its category of app. App developers inherit from the specialized base class that best matches what their app does, instead of inheriting from ClamsApp directly.

This page first recaps what every CLAMS app inherits from ClamsApp (the baseline), then documents each specialized base class and what it adds on top.

What every CLAMS app inherits¶

Every CLAMS app subclasses ClamsApp (directly or via a specialized base class such as ClamsPromptableApp) and inherits its baseline behaviors: parameter casting and refinement, view signing, JSON envelope unwrapping, CUDA memory profiling and cleanup, error views, and a set of universal runtime parameters that the SDK auto-injects into every app’s metadata.

Universal parameters¶

Added automatically by __init__() at runtime and by the standard metadata.py template’s __main__ block at python metadata.py time. App developers do not declare them.

Name	Type	Default	Multi-valued	Notes
`pretty`	boolean	`false`	no	When `true`, the response MMIF JSON is re-formatted with 2-space indentation.
`runningTime`	boolean	`true`	no	When `true`, the running time of the request is recorded in the view metadata.
`hwFetch`	boolean	`false`	no	When `true`, host hardware info (architecture, GPU and vRAM) is recorded in the view metadata.
`tfSamplingMode`	string	`'representatives'`	no	For apps that process `TimeFrame` annotations: how to sample frames within each TimeFrame. Choices: `'representatives'`, `'single'`, `'all'`. No effect on apps that do not process TimeFrames.

SDK-managed parameter names are reserved¶

Parameter names added by the SDK (the universal parameters listed above, plus any parameters added by a specialized base class) are reserved. An app’s appmetadata() MUST NOT declare any of these names via AppMetadata.add_parameter() directly; doing so trips the existing duplicate-name ValueError when the SDK tries to add its own spec.

This reservation guarantees a uniform, predictable parameter interface across all CLAMS apps. App developers can still customize a reserved parameter’s default value (but not its type, multivalued, or choices) by mutating the default field on the already-injected parameter object; see Customizing default values for a worked example.

Promptable CLAMS Apps¶

A promptable app is a CLAMS app that wraps a promptable model: a large language model (LLM), vision-language model (VLM), audio-language model (ALM), large multimodal model (LMM), or remote generative API. The SDK provides ClamsPromptableApp as a specialized base class for these apps. It standardizes the runtime parameter surface (prompts, generation hyperparameters, batch size) and provides helpers for building chat conversations and persisting model responses into MMIF.

This section is the developer guide for writing or migrating a CLAMS app that inherits from ClamsPromptableApp. For the general CLAMS app development pattern, see the Getting Started, Tutorial: Wrapping an NLP Application, and Runtime Configuration pages.

When to use `ClamsPromptableApp`¶

Choose ClamsPromptableApp over ClamsApp when your app’s core operation is “given a prompt and some input (image/audio/text/structured data), return generated text.” Concretely:

Image captioning, VLM-based OCR, scene description
Audio captioning, transcription via ALMs
Summarization, classification, structured-data extraction via LLMs
Tasks driven by an LMM that takes mixed-modality inputs
Any app that wraps a remote LLM, VLM, ALM, or LMM API and forwards a prompt

If your app does not call a generative model (e.g. a classical OCR engine, a speech-to-text engine that doesn’t take prompts, a classifier wrapping a discriminative model), keep using ClamsApp directly.

Note

ClamsPromptableApp assumes an instruction- or chat-tuned model with a system/user/assistant role structure. Bare completion / next-token-prediction base models do not fit this base class cleanly; use ClamsApp directly for those.

Standardized runtime parameters¶

Every ClamsPromptableApp exposes the following SDK-managed runtime parameters in addition to the universal parameters from ClamsApp. These names are reserved; see SDK-managed parameter names are reserved.

Name	Type	Default	Multi-valued	Notes
`prompt`	string	(required, no default)	yes	User prompt(s) sent to the model. A single value runs as a one-shot generation. A multi-value list is interpreted as a multi-turn static prompt; see Multi-turn handling (promptMode).
`systemPrompt`	string	`''`	no	Optional system-role text prepended to the conversation.
`promptMode`	string	`'turn-taking'`	no	How to interpret a multi-value `prompt` list. Choices: `'turn-taking'` or `'user-only'`. See Multi-turn handling (promptMode).
`maxNewTokens`	integer	`512`	no	Maximum number of new tokens generated per inference call. Larger values grow the KV cache linearly and add to GPU memory usage; reduce if VRAM is constrained.
`useReasoning`	boolean	`false`	no	Request the model’s reasoning (“thinking”) mode. Off by default; honored only by apps whose model has a distinct reasoning mode (others ignore it). Much slower and more token-hungry – the trace is generated before the answer and drawn from the `maxNewTokens` budget, and small models can loop without terminating. When honored, the trace is stored in the `modelReasoningTrace` property of the output `TextDocument`. See Reasoning traces (useReasoning).
`temperature`	number	`0.0`	no	Sampling temperature. `0.0` selects deterministic / greedy decoding for maximum reproducibility; override for sampled generation.
`topP`	number	`1.0`	no	Nucleus-sampling cumulative probability cutoff. Only meaningful when `temperature` > 0.
`topK`	integer	`50`	no	Top-K sampling cutoff. Only meaningful when `temperature` > 0.
`parallelPrompts`	integer	`1`	no	Number of independent prompts the app stacks into a single forward pass. Per-prompt content size is the app’s responsibility; prompt count and per-prompt size combine multiplicatively for GPU memory. Keep at `1` on memory-tight setups; see the parameter’s own description in `promptable_parameters` for an OOM-risk example.

Customizing default values¶

The SDK ships sensible defaults for most promptable parameters but deliberately leaves prompt without a default; prompts are inherently app-specific and no single value is right for all apps. Beyond prompt, other defaults may also be inappropriate for a given app: a model that needs longer outputs wants a higher maxNewTokens, a small-VRAM deployment wants parallelPrompts pinned at 1, etc.

Because the reservation rule prevents calling metadata.add_parameter('prompt', ...) (or any other promptable name) directly, the recommended pattern for customizing defaults is to mutate the default field on the already-injected parameter object right after calling inject_promptable_parameters(). You’ll see a worked example of this in the metadata.py generated by the clams develop scaffold.

This works for any promptable parameter. The parameter spec itself (type, multivalued, choices) stays locked by the SDK; only the default field is meant to be mutated this way, which preserves the cross-app uniformity that the reservation rule is designed to guarantee.

If an app wants to require callers to pass a value explicitly (for prompt or any other parameter), it can simply leave the default unchanged. prompt already has no default, so the SDK will raise a “required parameter” error if the caller omits it; for other params, deleting the SDK default and leaving it None would have the same effect, though that’s rarely useful.

Declaring a promptable app¶

A promptable app requires two paired edits relative to the scaffold generated by clams develop:

In app.py, change the app class’s base from ClamsApp to ClamsPromptableApp and implement generate(). The scaffold file already contains a guiding comment at the class declaration line.
In metadata.py, call ClamsPromptableApp.inject_promptable_parameters at the end of appmetadata(). The scaffold file already contains a commented-out helper-call block; uncomment it.

The __main__ block in metadata.py is unchanged from non-promptable apps. The helper call inside appmetadata() makes the promptable parameters visible to both python metadata.py (build-time discovery) and to _load_appmetadata() (runtime). The base class change ensures the app inherits the parameter-presence validation, the generate() contract, and the helper methods at runtime.

The `generate()` contract¶

Subclasses of ClamsPromptableApp that wrap a backend without a default SDK implementation (e.g., remote-API or custom local backends) MUST implement generate(). Subclasses of ClamsHFPromptableApp inherit a concrete generate() and do not need to override it. See the method’s docstring for the full signature, batch semantics, and return value.

Keep inference logic inside generate() distinct from MMIF I/O; the latter belongs in _annotate() (which calls self.generate()). This separation lets HF-backed apps inherit the default generate() without restating backend mechanics, and lets non-HF apps swap in a new generate() without rewriting their MMIF I/O.

Multi-turn handling (`promptMode`)¶

prompt is always a List[str] after parameter casting. When the list has a single element, promptMode is irrelevant (single-shot generation). When the list has multiple elements, promptMode selects between two multi-element prompting strategies:

Turn-taking (default). The list is interpreted as an alternating user/assistant conversation: even indices (0, 2, 4, …) are user turns, odd indices are assistant turns. The full conversation is sent to the model in a single generate call. This mode supports any pattern that fits an alternating role structure, including few-shot in-context learning (where the (user, assistant) pairs are task exemplars and the final user turn is the new query), multi-turn dialogue continuation, and role-play scaffolding. Example (few-shot ICL): ["Classify sentiment: 'I love this.'", "positive", "Classify sentiment: 'I hate this.'", "negative", "Classify sentiment: 'It's okay.'"]: two exemplar pairs followed by a final query; one inference returns the final reply.

User-only. Every element is a user turn; the model generates an assistant reply between each, in N successive generate calls. Only the final assistant response is returned per input item. This mode implements iterative / scripted multi-step prompting, a manual, externally-driven scaffold for stepwise reasoning. (It is distinct from in-model zero-shot chain-of-thought, where stepwise reasoning is elicited inside a single inference call by a prompt like “let’s think step by step”; here, the user-side scaffolding makes the steps explicit and feeds each intermediate model output back as context for the next user turn.) Example (scripted multi-step reasoning): ["Step 1: identify objects.", "Step 2: describe relationships.", "Step 3: conclude."]: three sequential user prompts, three inferences, final reply returned.

turn-taking is the default because it costs a single inference call and is the more common multi-element pattern.

Reasoning traces (`useReasoning`)¶

Some models expose a distinct reasoning (“thinking”) mode that emits an intermediate reasoning trace before the answer. useReasoning is the standard toggle for that mode. It is inert unless the app honors it, and honoring is a per-app choice with three wiring points:

map useReasoning onto the backend’s reasoning switch (for HF chat templates, typically enable_thinking) by overriding build_template_kwargs();
split the trace from the answer with split_tagged_reasoning_trace() (inline <think>...</think>-style traces only; channel-delimited formats need app-specific parsing). Many chat templates prefill the opening <think> into the prompt, so only the closing tag appears in the output; pass assume_open=True (wired to the reasoning toggle) so the trace is still recovered and a missing close tag is read as unterminated reasoning (empty answer) rather than a plain answer;
pass the trace to response_to_grounded_textdocument() through its reasoning_trace argument, which stores it in the modelReasoningTrace property of the produced TextDocument, separate from the document text.

An app whose model has no reasoning mode leaves useReasoning unwired; callers may still set it and it has no effect, like topK under greedy decoding.

Cost and failure modes¶

Reasoning is much slower and far more token-hungry than a direct answer: the entire trace is generated first, from the same budget capped by maxNewTokens. Enabling useReasoning without raising maxNewTokens substantially (thousands of tokens, not hundreds) risks the trace consuming the whole budget, leaving the answer truncated or empty. Small reasoning models (as a rule of thumb, roughly 4B parameters and under) are especially prone to non-terminating “thinking loops” that never reach an answer; budget generously and validate termination per model. When the trace overruns the budget before closing, assume_open=True yields an empty answer plus the unterminated trace – an app should surface that (e.g. a warning and an empty TextDocument) rather than emitting the raw reasoning as the answer.

Helpers¶

inject_promptable_parameters(): A static method called from your app’s appmetadata() (in metadata.py) to add the SDK-managed promptable parameters.
build_conversation(): Instance method that constructs a chat-template-compatible message list (or a List[List[dict]] of progressively-extending prefixes for user-only mode). Handles string and list prompt forms, the two promptMode semantics, the optional systemPrompt, and inlines images / audios into the (final) user turn. Accepts a pre-built List[dict] and returns it unchanged. Subclasses may override to access model-specific state (e.g. self.processor) when formatting messages.
response_to_grounded_textdocument(): Writes a TextDocument plus an Alignment (source -> TD) into a view. source is the coarse cross-modal anchor; the optional origins (paired with origination) is the finer derivation list, written to the TD’s origins / origination properties. See https://clams.ai/clams-vocabulary/Document for vocabulary semantics.

HuggingFace Promptable Apps¶

For the very common case of “promptable CLAMS app + local HuggingFace transformers model,” the SDK provides ClamsHFPromptableApp, a specialized subclass of ClamsPromptableApp that absorbs all HF-specific inference boilerplate. Concrete apps inheriting from it declare the model via a few class attributes and typically only need to implement _annotate() for their MMIF I/O.

When to use¶

Choose ClamsHFPromptableApp over plain ClamsPromptableApp when your app:

wraps a local HuggingFace transformers model loadable via from_pretrained(), AND
runs the standard chat-template -> model.generate -> batch_decode inference pipeline (every modern instruct-tuned VLM/LLM in HF), AND
doesn’t need bespoke pixel-value preprocessing or vision-token stitching at inference time.

If your app uses a remote API instead (OpenAI, Anthropic, etc.), or a non-HF local backend, inherit from ClamsPromptableApp directly and implement generate() yourself.

Declaring an HF promptable app¶

On top of the baseline declaration shared by every promptable app (see Declaring a promptable app), a ClamsHFPromptableApp subclass:

Uses ClamsHFPromptableApp (not ClamsPromptableApp) as the base class in app.py.
Declares the required class attribute MODEL_CLS and any optional dtype / padding / kwargs hints (see Class-attribute hooks for the full list).
Sets analyzer_versions={<hf-id>: <commit-hash>, ...} on the AppMetadata constructor call in metadata.py (replaces the singular analyzer_version for HF apps).
Calls ClamsHFPromptableApp.inject_promptable_parameters (the HF override of the plain helper) at the end of appmetadata(). The scaffold metadata.py contains a commented-out HF block; uncomment it.
Inherits the base class’s generate() implementation; no override needed.

For a minimal worked example, see the class docstring on ClamsHFPromptableApp.

Class-attribute hooks¶

Concrete subclasses declare the model class plus optional dtype / padding hints via class attributes, and declare the family of supported model variants (with pinned commits) via analyzer_versions in metadata.py:

Attribute	Meaning	Required
`MODEL_CLS`	`transformers` model class (e.g. `AutoModelForImageTextToText`, `AutoModelForCausalLM`).	yes
`PROCESSOR_CLS`	Processor / tokenizer / feature-extractor class. Defaults to `AutoProcessor`.	no
`DTYPE`	Torch dtype for the model and for `pixel_values` casting in `generate()`. E.g. `torch.bfloat16` for low-precision LLM inference.	no
`PADDING_SIDE`	Tokenizer padding side. `'left'` for decoder-only batched generation; leave unset otherwise.	no
`MODEL_KWARGS` / `PROCESSOR_KWARGS`	Extra kwargs forwarded to the respective `from_pretrained()` calls (e.g. `trust_remote_code=True`).	no

The HF model identifiers themselves are NOT a class attribute. They live in metadata.py as analyzer_versions, a Dict[str, str] mapping each supported model id to its pinned commit hash. The SDK auto-derives a model runtime parameter from this dict, with choices set to the dict keys.

Family / singleton handling¶

When analyzer_versions contains a single entry (the typical single-model app), the SDK eagerly pre-loads that one model in __init__ and sets model.default to the only key so callers can omit the parameter. Single-model apps thus preserve warm-start semantics: the model is loaded at app startup, not on first request.

When analyzer_versions contains multiple entries (a family app), loading is deferred until the first load_model() call inside _annotate, and model has no default by default; callers must pick a family member explicitly (or the dev mutates model.default post-injection to provide a recommended pick). Loaded models are cached per (model_id, revision) for the lifetime of the app instance; switching models loads on first miss, cache-hits on repeat.

What the base class provides¶

A subclass typically only writes _annotate(). The base class supplies:

model loading and caching via load_model(), which wraps clams.backends.hf.load_hf_model() (non-promptable HF apps can call that loader directly without going through this base class);
the parameter injector ClamsHFPromptableApp.inject_promptable_parameters;
a concrete batched HF generate();
a default build_gen_kwargs() that maps the SDK promptable parameters to HF model.generate() kwargs;
a build_template_kwargs() hook for chat-template controls, the override point for honoring useReasoning (see Reasoning traces (useReasoning)).

See each method’s docstring for full details.

Apps using the HF backend (with or without the promptable wrapper) must install the [hf] extra: pip install clams-python[hf].

Specialized App Base Classes¶

What every CLAMS app inherits¶

Universal parameters¶

SDK-managed parameter names are reserved¶

Promptable CLAMS Apps¶

When to use ClamsPromptableApp¶

Standardized runtime parameters¶

Customizing default values¶

Declaring a promptable app¶

The generate() contract¶

Multi-turn handling (promptMode)¶

Reasoning traces (useReasoning)¶

Cost and failure modes¶

Helpers¶

HuggingFace Promptable Apps¶

When to use¶

Declaring an HF promptable app¶

Class-attribute hooks¶

Family / singleton handling¶

Reproducibility: model refinement and view metadata¶

What the base class provides¶

When to use `ClamsPromptableApp`¶

The `generate()` contract¶

Multi-turn handling (`promptMode`)¶

Reasoning traces (`useReasoning`)¶

Reproducibility: `model` refinement and view metadata¶