Overview

The Vision Language Model Lookup node is designed to perform lookups using Vision Language Models (using Lumeo Cloud, Google, OpenAI, Anthropic, Nvidia, AWS Bedrock) on objects within a Region of Interest (ROI), or on a ROI within the frame. This functionality is useful for applications requiring advanced recognition and understanding of objects or areas within a video feed.

Inputs & Outputs

Inputs: 1, Media Format: Raw Video
Outputs: 1, Media Format: Raw Video
Output Metadata: nodes.node_id, recognized_objs, recognized_obj_ids, recognized_obj_count, recognized_obj_delta, value_changed_delta, unrecognized_obj_count, unrecognized_obj_delta

Properties

Property	Description	Type	Default	Required
`roi_labels`	Regions of interest labels	hidden	null	Yes
`rois`	Regions of interest. Conditional on `roi_labels`.	polygon	null	Yes
`processing_mode`	Processing mode. Options: ROIs, at Interval (rois_interval), ROIs, upon Trigger (rois_trigger), Objects in an ROI (objects).	enum	rois_interval	Yes
`interval`	Collect objects or ROIs for lookup atleast this many seconds apart.	float	1	No
`trigger`	Queue ROI for lookup when this condition evaluates to true. Conditional on `processing_mode` being `rois_trigger`.	trigger-condition	null	No
`batch_mode`	ROI batch mode. Options: Single (`single`) - Lookup current ROI image only, Batch (`batch`) - Lookup last n images of the ROI, Compare with reference (`reference`) - Lookup current ROI image and reference image.	enum	single	No
`reference_image_mode`	Reference image source. Options: `deployment_start` (live stream when deployment starts), `stream_snapshot` (source stream snapshot), `url` (image url). Conditional on `batch_mode` being `reference`	enum	`deployment_start`	No
`reference_image_url`	URI for reference image. Accepts http(s) urls or `lumeo://<fileid>` URIs for files uploaded using Lumeo Files API	string	null	Yes
`batch_size`	Number of images to process in each request. Set to more than 1 to use prompts that reference multiple images. Conditional on `batch_mode` being `batch`.	number	1	No
`enable_prebuffer`	If true, samples ROI at `Lookup interval` to fill batch, and performs lookup when trigger is met. Else performs lookup when batch is full. Conditional on `processing_mode` being `rois_trigger`.	bool	false	No
`objects_to_process`	Object types to process (e.g. car, person, car.red). Conditional on `processing_mode` being `objects`.	model-label	null	No
`obj_lookup_mode`	Object lookup mode. Options: Until result (until_result) - Lookup on interval or size change until result obtained or max attempts exhausted, Continuously (continuous) - Periodically at an interval. Conditional on `processing_mode` being `objects`.	enum	until_result	No
`min_obj_size_pixels`	Min. width and height of an object. Conditional on `processing_mode` being `objects`.	number	64	No
`obj_lookup_size_change_threshold`	If the size of an object changes by more than this threshold, perform a lookup. Min: 0.01, Max: 2.0, Step: 0.2. Conditional on `processing_mode` being `objects`.	slider	0.1	No
`max_lookups_per_obj`	Maximum number of attempts to perform a lookup for an object in the `Until result` lookup mode. Conditional on `processing_mode` being `objects`.	number	5	No
`model_provider`	Model provider. Options: Google Cloud (`google`), OpenAI Cloud (`openai`), Anthropic Cloud (`anthropic`), Nvidia NIM Cloud (`nvidia`), Lumeo Cloud (`lumeo`), AWS Bedrock (`aws`), Self-hosted (`self`).	enum	`lumeo`	No
`model_url`	Self-hosted model URL. Conditional on `model_provider` being `self`.	string	null	Yes
`api_key`	API key for the model provider. For AWS Bedrock, provide it in format: `<ACCESS_KEY_ID>:<SECRET_ACCESS_KEY>`. Required only when Model provider is NOT `lumeo`.	string	null	No
`custom_model`	Required if Model provider is Self-hosted. For other providers, optional. Overrides provider-specific default model if specified. See below for formats and supported models for selected provider.	string	null	No
`prompt`	Provide a prompt, additional instructions or context for the model. Required if no `attributes` are provided.	text	null	No
`description_mode`	Generate a description of the scene or objects in the images. This description will be used for search and summarization. Options: `none`, `when_attributes_present`, `when_alert_present`, `always`	enum	`none`	No
`attributes`	Provide attribute names and for each attribute, a question or description with optional answer choices to extract the attribute value - as a JSON dictionary. Special attributes if present: `description` overrides description mode, `alert` describes condition to trigger an alert and `alert_message` overrides the message to display when an alert is triggered. See examples below.	json	null	No
`objects_to_extract`	Provide object labels and for each object type, a description with optional properties to extract bounding boxes for objects - as a JSON dictionary. See examples below.	json	null	No
`detail_level`	Image resolution. Options: Low (low), High (high).	enum	low	No
`max_tokens`	Maximum number of tokens to return for each request.	number	500	No
`display_roi`	Display ROI on video?	bool	true	No
`display_objinfo`	Display results on video? Options: Disabled (disabled), Bottom left (bottom_left), Bottom right (bottom_right), Top left (top_left), Top right (top_right).	enum	bottom_left	No
`debug`	Log debugging information?	bool	false	No
`bbox_mode`	Treat bounding box coordinates returned by the model as absolute or normalized to the ROI, and in the format specified. Options: Model default (`default`), Absolute XYXY (`absolute_xyxy`), Absolute YXYX (`absolute_yxyx`), Normalized XYXY (`normalized_xyxy`), Normalized YXYX (`normalized_yxyx`).	enum	default	No
`bbox_adjustment_factor`	Divide the bounding box coordinates from the model by this factor to convert to normalized coordinates (0-1) for normalized bounding box mode. Leave empty to use defaults.	number	null	No

Prompt Examples

Generate scene description.

Analyze the scene and provide a concise description of any unique, interesting, or noteworthy elements that would be suitable for a push notification alert. Focus on key details that capture the essence of what's happening or what's important in the image.

Attribute Examples

Providing explicit attributes lets the model return structured output that will be added as ROI or object attributes.

Each attribute is a key-value pair. Key is the attribute name, and value is the instruction for the model to extract the attribute.
The model will return the extracted attribute value as a string, which will be added as an attribute to the object/ROI.

Describe the image

{"description": "Describe the image briefly."}

Describe the image and add attributes for vehicle type and numbers

{"description": "Describe the image briefly. Return null if no vehicle is present.", "vehicle_type": "Comma separated list of vehicle types: car\|bus\|van", "vehicle_numbers": "Comma separated list of vehicle numbers"}

Add an alert flag in the metadata
{"alert": "Is this person wearing a pink shirt?"}

Objects to Extract Examples

Certain models support extracting bounding boxes by describing the objects you are looking to detect (this is highly experimental at the moment).
For those models, you can specify objects_to_extract as a JSON with each key being the object label and value being the description, as follows:

{"person_ppe_violation": "Person violating ppe protocol. Properties: vest=vest/no_vest, helmet=helmet/no_helmet, goggles=goggles/no_goggles"}

This will insert objects with label person_ppe_detection that match the description, and add the properties as object attributes.

Custom Model Names

Model Provider	Format	Description
`aws`	`aws/<bedrock inference profile id>`	Bedrock inference profiles can be found here. ex. `aws/us.meta.llama3-2-11b-instruct-v1:0`
`google`	`google/<model_name>`	Supported Model names can be found here. ex. `google/gemini-1.5-flash-latest`
`openai`	`openai/<model_name>`	Model name can be found here. ex. `openai/gpt-4o`
`anthropic`	`anthropic/<model_name>`	Model name can be found here. ex. `anthropic/claude-3-5-sonnet-latest`
`nvidia`	`<model_path>`	Nvidia NIM Vision Language Models can be found here. Specify the model path as the model invoke url portion after the base url (`https://ai.api.nvidia.com/v1/`) from the Nvidia docs. Ex., for a model invoke url of `https://ai.api.nvidia.com/v1/vlm/nvidia/vila` the `model_path` would be `vlm/nvidia/vila`
`self`	`<model_name>`	Model name as required to be provided in OpenAI chat completions compatible endpoint.

Metadata

Metadata Property	Description
`nodes.[node_id].rois.[roi_id].label_changed_delta`	Indicates if there has been a change in the label of the ROI.
`nodes.[node_id].rois.[roi_id].label_available`	Indicates if a label is available for the ROI.
`nodes.[node_id].rois.[roi_id].label`	Label contents for the ROI. When `attributes` property is not specified, this contains results from the model. If `attributes` property is specified, this will contain results from a `description` attribute.
`nodes.[node_id].rois.[roi_id].attributes.[attribute_name]`	Attribute values for the ROI, if `attributes` property is specified. Will exclude `description` attribute.
`nodes.[node_id].recognized_obj_count`	The count of recognized objects.
`nodes.[node_id].recognized_obj_delta`	The change in the count of recognized objects.
`nodes.[node_id].label_changed_obj_delta`	The change in the count of objects with changed labels.
`nodes.[node_id].unrecognized_obj_count`	The count of unrecognized objects.
`nodes.[node_id].unrecognized_obj_delta`	The change in the count of unrecognized objects.
`nodes.[node_id].alert`	True if an alert was triggered else False
`nodes.[node_id].alert_message`	Brief description of alert

Example JSON

{
    "nodes": {
        "lvm1": {
            "type": "lvm",
            "rois": {
                "roi2": {
                    "label_changed_delta": true,
                    "label_available": true,
                    "label": "The scene contains 2 vehicles that seem to be in an accident.",
                    "attributes": {
                        "vehicle_type": "car, bus",
                        "vehicle_numbers": "1234, 5678"
                    }
                }
            },
            "recognized_obj_ids": ["2775161862"],
            "recognized_obj_count": 1,
            "recognized_obj_delta": 1,
            "label_changed_obj_delta": 1,
            "unrecognized_obj_count": 0,
            "unrecognized_obj_delta": 0,
            "alert": true,
            "alert_message": "Vehicle accident",
            "objects_of_interest_keys": ["recognized_obj_ids"]
        }
    },
    "objects": [{
        "id": 2775161862,
        "source_node_id": null,
        "model_id": null,
        "label": "roi2",
        "class_id": 10600,
        "rect": {
            "left": 128,
            "top": 72,
            "width": 512,
            "height": 575
        },
        "probability": 1.0,
        "attributes": [{
            "label": "unblocked",
            "class_id": 10602,
            "probability": 1.0
        }, {
            "label": "lvm_results",
            "class_id": 10601,
            "probability": 1.0
        }, {
            "label": "lvm_roi",
            "class_id": 10600,
            "probability": 1.0
        }],
        "corr_id": "75f5141e-020a-4f27-af26-cf17b32c2544"
    }]
}