Vision Language Model Lookup
Perform a lookup with Vision Language Model on objects in an ROI, or on a ROI in the frame. Supports models like GPT-4o, Claude, Gemini, Llama using Lumeo Cloud, Google, OpenAI, Anthropic, Nvidia, AWS Bedrock.
Overview
The Vision Language Model Lookup node is designed to perform lookups using Vision Language Models (using Lumeo Cloud, Google, OpenAI, Anthropic, Nvidia, AWS Bedrock) on objects within a Region of Interest (ROI), or on a ROI within the frame. This functionality is useful for applications requiring advanced recognition and understanding of objects or areas within a video feed.
Inputs & Outputs
- Inputs: 1, Media Format: Raw Video
- Outputs: 1, Media Format: Raw Video
- Output Metadata:
nodes.node_id
,recognized_objs
,recognized_obj_ids
,recognized_obj_count
,recognized_obj_delta
,value_changed_delta
,unrecognized_obj_count
,unrecognized_obj_delta
Properties
Property | Description | Type | Default | Required |
---|---|---|---|---|
roi_labels | Regions of interest labels | hidden | null | Yes |
rois | Regions of interest. Conditional on roi_labels . | polygon | null | Yes |
processing_mode | Processing mode. Options: ROIs, at Interval (rois_interval), ROIs, upon Trigger (rois_trigger), Objects in an ROI (objects). | enum | rois_interval | Yes |
interval | Collect objects or ROIs for lookup atleast this many seconds apart. | float | 1 | No |
trigger | Queue ROI for lookup when this condition evaluates to true. Conditional on processing_mode being rois_trigger . | trigger-condition | null | No |
batch_mode | ROI batch mode. Options: Single (single ) - Lookup current ROI image only, Batch (batch ) - Lookup last n images of the ROI, Compare with reference (reference ) - Lookup current ROI image and reference image. | enum | single | No |
batch_size | Number of images to process in each request. Set to more than 1 to use prompts that reference multiple images. Conditional on batch_mode being batch . | number | 1 | No |
enable_prebuffer | If true, samples ROI at Lookup interval to fill batch, and performs lookup when trigger is met. Else performs lookup when batch is full. Conditional on processing_mode being rois_trigger . | bool | false | No |
objects_to_process | Object types to process (e.g. car, person, car.red). Conditional on processing_mode being objects . | model-label | null | No |
obj_lookup_mode | Object lookup mode. Options: Until result (until_result) - Lookup on interval or size change until result obtained or max attempts exhausted, Continuously (continuous) - Periodically at an interval. Conditional on processing_mode being objects . | enum | until_result | No |
min_obj_size_pixels | Min. width and height of an object. Conditional on processing_mode being objects . | number | 64 | No |
obj_lookup_size_change_threshold | If the size of an object changes by more than this threshold, perform a lookup. Min: 0.01, Max: 2.0, Step: 0.2. Conditional on processing_mode being objects . | slider | 0.1 | No |
max_lookups_per_obj | Maximum number of attempts to perform a lookup for an object in the Until result lookup mode. Conditional on processing_mode being objects . | number | 5 | No |
model_provider | Model provider. Options: Google Cloud (google ), OpenAI Cloud (openai ), Anthropic Cloud (anthropic ), Nvidia NIM Cloud (nvidia ), Lumeo Cloud (lumeo ), AWS Bedrock (aws ), Self-hosted (self ). | enum | lumeo | No |
model_url | Self-hosted model URL. Conditional on model_provider being self . | string | null | Yes |
api_key | API key for the model provider. For AWS Bedrock, provide it in format: <ACCESS_KEY_ID>:<SECRET_ACCESS_KEY> . Required only when Model provider is NOT lumeo . | string | null | No |
custom_model | Required if Model provider is Self-hosted. For other providers, optional. Overrides provider-specific default model if specified. See below for formats and supported models for selected provider. | string | null | No |
prompt | Provide a prompt, additional instructions or context for the model. Required if no attributes are provided. | text | null | No |
attributes | If provided, model will look for these attributes and add resulting values as object/ROI attributes. Leave empty to output results as text. If present, description is a special attribute which will be added to scene description for search and summarization. See examples below. | json | null | No |
detail_level | Image resolution. Options: Low (low), High (high). | enum | low | No |
max_tokens | Maximum number of tokens to return for each request. | number | 500 | No |
display_roi | Display ROI on video? | bool | true | No |
display_objinfo | Display results on video? Options: Disabled (disabled), Bottom left (bottom_left), Bottom right (bottom_right), Top left (top_left), Top right (top_right). | enum | bottom_left | No |
debug | Log debugging information? | bool | false | No |
Prompt Examples
Generate scene description.
Analyze the scene and provide a concise description of any unique, interesting, or noteworthy elements that would be suitable for a push notification alert. Focus on key details that capture the essence of what's happening or what's important in the image.
Attribute Examples
Providing explicit attributes lets the model return structured output that will be added as ROI or object attributes.
Each attribute is a key-value pair. Key is the attribute name, and value is the instruction for the model to extract the attribute.
The model will return the extracted attribute value as a string, which will be added as an attribute to the object/ROI.
Describe the image
{"description": "Describe the image briefly."}
Describe the image and add attributes for vehicle type and numbers
{"description": "Describe the image briefly. Return null if no vehicle is present.", "vehicle_type": "Comma separated list of vehicle types: car\|bus\|van", "vehicle_numbers": "Comma separated list of vehicle numbers"}
Custom Model Names
Model Provider | Format | Description |
---|---|---|
aws | aws/<bedrock inference profile id> | Bedrock inference profiles can be found here. ex. aws/us.meta.llama3-2-11b-instruct-v1:0 |
google | google/<model_name> | Supported Model names can be found here. ex. google/gemini-1.5-flash-latest |
openai | openai/<model_name> | Model name can be found here. ex. openai/gpt-4o |
anthropic | anthropic/<model_name> | Model name can be found here. ex. anthropic/claude-3-5-sonnet-latest |
nvidia | <model_path> | Nvidia NIM Vision Language Models can be found here. Specify the model path as the model invoke url portion after the base url ( https://ai.api.nvidia.com/v1/ ) from the Nvidia docs. Ex., for a model invoke url of https://ai.api.nvidia.com/v1/vlm/nvidia/vila the model_path would be vlm/nvidia/vila |
self | <model_name> | Model name as required to be provided in OpenAI chat completions compatible endpoint. |
Metadata
Metadata Property | Description |
---|---|
nodes.[node_id].rois.[roi_id].label_changed_delta | Indicates if there has been a change in the label of the ROI. |
nodes.[node_id].rois.[roi_id].label_available | Indicates if a label is available for the ROI. |
nodes.[node_id].rois.[roi_id].label | Label contents for the ROI. When attributes property is not specified, this contains results from the model. If attributes property is specified, this will contain results from a description attribute. |
nodes.[node_id].rois.[roi_id].attributes.[attribute_name] | Attribute values for the ROI, if attributes property is specified. Will exclude description attribute. |
nodes.[node_id].recognized_obj_count | The count of recognized objects. |
nodes.[node_id].recognized_obj_delta | The change in the count of recognized objects. |
nodes.[node_id].label_changed_obj_delta | The change in the count of objects with changed labels. |
nodes.[node_id].unrecognized_obj_count | The count of unrecognized objects. |
nodes.[node_id].unrecognized_obj_delta | The change in the count of unrecognized objects. |
Example JSON
{
"nodes": {
"lvm1": {
"type": "lvm",
"rois": {
"roi2": {
"label_changed_delta": true,
"label_available": true,
"label": "The scene contains 2 vehicles that seem to be in an accident.",
"attributes": {
"vehicle_type": "car, bus",
"vehicle_numbers": "1234, 5678"
}
}
},
"recognized_obj_ids": ["2775161862"],
"recognized_obj_count": 1,
"recognized_obj_delta": 1,
"label_changed_obj_delta": 1,
"unrecognized_obj_count": 0,
"unrecognized_obj_delta": 0,
"objects_of_interest_keys": ["recognized_obj_ids"]
}
},
"objects": [{
"id": 2775161862,
"source_node_id": null,
"model_id": null,
"label": "roi2",
"class_id": 10600,
"rect": {
"left": 128,
"top": 72,
"width": 512,
"height": 575
},
"probability": 1.0,
"attributes": [{
"label": "unblocked",
"class_id": 10602,
"probability": 1.0
}, {
"label": "lvm_results",
"class_id": 10601,
"probability": 1.0
}, {
"label": "lvm_roi",
"class_id": 10600,
"probability": 1.0
}],
"corr_id": "75f5141e-020a-4f27-af26-cf17b32c2544"
}]
}
Updated 18 days ago