Vision Language Model Lookup

Perform a lookup with Vision Language Model on objects in an ROI, or on a ROI in the frame. Supports models like GPT-4o, Claude, Gemini, Llama using Lumeo Cloud, Google, OpenAI, Anthropic, Nvidia, AWS Bedrock.

Overview

The Vision Language Model Lookup node is designed to perform lookups using Vision Language Models (using Lumeo Cloud, Google, OpenAI, Anthropic, Nvidia, AWS Bedrock) on objects within a Region of Interest (ROI), or on a ROI within the frame. This functionality is useful for applications requiring advanced recognition and understanding of objects or areas within a video feed.

Inputs & Outputs

  • Inputs: 1, Media Format: Raw Video
  • Outputs: 1, Media Format: Raw Video
  • Output Metadata: nodes.node_id, recognized_objs, recognized_obj_ids, recognized_obj_count, recognized_obj_delta, value_changed_delta, unrecognized_obj_count, unrecognized_obj_delta

Properties

PropertyDescriptionTypeDefaultRequired
roi_labelsRegions of interest labelshiddennullYes
roisRegions of interest. Conditional on roi_labels.polygonnullYes
processing_modeProcessing mode. Options: ROIs, at Interval (rois_interval), ROIs, upon Trigger (rois_trigger), Objects in an ROI (objects).enumrois_intervalYes
intervalCollect objects or ROIs for lookup atleast this many seconds apart.float1No
triggerQueue ROI for lookup when this condition evaluates to true. Conditional on processing_mode being rois_trigger.trigger-conditionnullNo
batch_modeROI batch mode. Options: Single (single) - Lookup current ROI image only, Batch (batch) - Lookup last n images of the ROI, Compare with reference (reference) - Lookup current ROI image and reference image.enumsingleNo
batch_sizeNumber of images to process in each request. Set to more than 1 to use prompts that reference multiple images. Conditional on batch_mode being batch.number1No
enable_prebufferIf true, samples ROI at Lookup interval to fill batch, and performs lookup when trigger is met. Else performs lookup when batch is full. Conditional on processing_mode being rois_trigger.boolfalseNo
objects_to_processObject types to process (e.g. car, person, car.red). Conditional on processing_mode being objects.model-labelnullNo
obj_lookup_modeObject lookup mode. Options: Until result (until_result) - Lookup on interval or size change until result obtained or max attempts exhausted, Continuously (continuous) - Periodically at an interval. Conditional on processing_mode being objects.enumuntil_resultNo
min_obj_size_pixelsMin. width and height of an object. Conditional on processing_mode being objects.number64No
obj_lookup_size_change_thresholdIf the size of an object changes by more than this threshold, perform a lookup. Min: 0.01, Max: 2.0, Step: 0.2. Conditional on processing_mode being objects.slider0.1No
max_lookups_per_objMaximum number of attempts to perform a lookup for an object in the Until result lookup mode. Conditional on processing_mode being objects.number5No
model_providerModel provider. Options: Google Cloud (google), OpenAI Cloud (openai), Anthropic Cloud (anthropic), Nvidia NIM Cloud (nvidia), Lumeo Cloud (lumeo), AWS Bedrock (aws), Self-hosted (self).enumlumeoNo
model_urlSelf-hosted model URL. Conditional on model_provider being self.stringnullYes
api_keyAPI key for the model provider. For AWS Bedrock, provide it in format: <ACCESS_KEY_ID>:<SECRET_ACCESS_KEY>. Required only when Model provider is NOT lumeo.stringnullNo
custom_modelRequired if Model provider is Self-hosted. For other providers, optional. Overrides provider-specific default model if specified. See below for formats and supported models for selected provider.stringnullNo
promptProvide a prompt, additional instructions or context for the model. Required if no attributes are provided.textnullNo
attributesIf provided, model will look for these attributes and add resulting values as object/ROI attributes. Leave empty to output results as text. If present, description is a special attribute which will be added to scene description for search and summarization. See examples below.jsonnullNo
detail_levelImage resolution. Options: Low (low), High (high).enumlowNo
max_tokensMaximum number of tokens to return for each request.number500No
display_roiDisplay ROI on video?booltrueNo
display_objinfoDisplay results on video? Options: Disabled (disabled), Bottom left (bottom_left), Bottom right (bottom_right), Top left (top_left), Top right (top_right).enumbottom_leftNo
debugLog debugging information?boolfalseNo

Prompt Examples

Generate scene description.

Analyze the scene and provide a concise description of any unique, interesting, or noteworthy elements that would be suitable for a push notification alert. Focus on key details that capture the essence of what's happening or what's important in the image.

Attribute Examples

Providing explicit attributes lets the model return structured output that will be added as ROI or object attributes.

Each attribute is a key-value pair. Key is the attribute name, and value is the instruction for the model to extract the attribute.
The model will return the extracted attribute value as a string, which will be added as an attribute to the object/ROI.

Describe the image

{"description": "Describe the image briefly."}

Describe the image and add attributes for vehicle type and numbers

{"description": "Describe the image briefly. Return null if no vehicle is present.", "vehicle_type": "Comma separated list of vehicle types: car\|bus\|van", "vehicle_numbers": "Comma separated list of vehicle numbers"}

Custom Model Names

Model ProviderFormatDescription
awsaws/<bedrock inference profile id>Bedrock inference profiles can be found here. ex. aws/us.meta.llama3-2-11b-instruct-v1:0
googlegoogle/<model_name>Supported Model names can be found here. ex. google/gemini-1.5-flash-latest
openaiopenai/<model_name>Model name can be found here. ex. openai/gpt-4o
anthropicanthropic/<model_name>Model name can be found here. ex. anthropic/claude-3-5-sonnet-latest
nvidia<model_path>Nvidia NIM Vision Language Models can be found here.
Specify the model path as the model invoke url portion after the base url (https://ai.api.nvidia.com/v1/) from the Nvidia docs.
Ex., for a model invoke url of https://ai.api.nvidia.com/v1/vlm/nvidia/vila the model_path would be vlm/nvidia/vila
self<model_name>Model name as required to be provided in OpenAI chat completions compatible endpoint.

Metadata

Metadata PropertyDescription
nodes.[node_id].rois.[roi_id].label_changed_deltaIndicates if there has been a change in the label of the ROI.
nodes.[node_id].rois.[roi_id].label_availableIndicates if a label is available for the ROI.
nodes.[node_id].rois.[roi_id].labelLabel contents for the ROI. When attributes property is not specified, this contains results from the model. If attributes property is specified, this will contain results from a description attribute.
nodes.[node_id].rois.[roi_id].attributes.[attribute_name]Attribute values for the ROI, if attributes property is specified. Will exclude description attribute.
nodes.[node_id].recognized_obj_countThe count of recognized objects.
nodes.[node_id].recognized_obj_deltaThe change in the count of recognized objects.
nodes.[node_id].label_changed_obj_deltaThe change in the count of objects with changed labels.
nodes.[node_id].unrecognized_obj_countThe count of unrecognized objects.
nodes.[node_id].unrecognized_obj_deltaThe change in the count of unrecognized objects.

Example JSON

{
    "nodes": {
        "lvm1": {
            "type": "lvm",
            "rois": {
                "roi2": {
                    "label_changed_delta": true,
                    "label_available": true,
                    "label": "The scene contains 2 vehicles that seem to be in an accident.",
                    "attributes": {
                        "vehicle_type": "car, bus",
                        "vehicle_numbers": "1234, 5678"
                    }
                }
            },
            "recognized_obj_ids": ["2775161862"],
            "recognized_obj_count": 1,
            "recognized_obj_delta": 1,
            "label_changed_obj_delta": 1,
            "unrecognized_obj_count": 0,
            "unrecognized_obj_delta": 0,
            "objects_of_interest_keys": ["recognized_obj_ids"]
        }
    },
    "objects": [{
        "id": 2775161862,
        "source_node_id": null,
        "model_id": null,
        "label": "roi2",
        "class_id": 10600,
        "rect": {
            "left": 128,
            "top": 72,
            "width": 512,
            "height": 575
        },
        "probability": 1.0,
        "attributes": [{
            "label": "unblocked",
            "class_id": 10602,
            "probability": 1.0
        }, {
            "label": "lvm_results",
            "class_id": 10601,
            "probability": 1.0
        }, {
            "label": "lvm_roi",
            "class_id": 10600,
            "probability": 1.0
        }],
        "corr_id": "75f5141e-020a-4f27-af26-cf17b32c2544"
    }]
}