Generic Detector

Detect objects based on the similarity of text prompts and the images of objects in the frame.

Overview

The Generic Detector node is designed to detect objects within a video frame based on the similarity of text prompts and the images of objects. This functionality is useful for applications requiring identification and categorization of objects in real-time.

Inputs & Outputs

  • Inputs: 1, Media Format: Raw Video
  • Outputs: 1, Media Format: Raw Video
  • Output Metadata: Objects

Properties

PropertyDescriptionTypeDefaultRequired
model_idThe type of model to use. Options: Yolo-Small (yolov8s-world), Yolo-Medium (yolov8m-world), Yolo-Large (yolov8l-world), OWL-Medium (owlvit-base-patch32).enumyolov8s-worldYes
class_listA comma-separated list of objects to detect, optionally with an alternate label. For example, a person=person, a blue car=blue_car.stringnullYes
intervalThe inference interval. Infer on every nth frame. 1 means infer every frame. The minimum value is 1 frame.number1Yes
confidence_thresholdThis property allows you to override the default minimum inference threshold for all classes. The minimum value is 0, the maximum value is 1.0, the step is 0.1, and the scale is 0.1.slider-optional0.1No
per_class_thresholdsA comma-separated list of per-class thresholds. Leave this field empty to use the default threshold for all classes. For example, 0.1,0.2,0.3.stringnullNo
iou_thresholdThis property allows you to increase the threshold to reduce potential duplicate detections of a single object. The minimum value is 0, the maximum value is 1.0, the step is 0.1, and the scale is 0.1.slider-optional0.5No
min_object_sizeThe minimum size of the object to detect. The size should be specified as width x height.stringnullNo
enable_max_optimizationsEnable advanced optimizations to improve performance. This currently increases deployment start time.boolfalseNo
clear_cacheSet to true to clear model cache. This will increase deployment start time.boolfalseNo

Metadata

The output metadata includes information about the detected objects within the frame. This includes object types, confidence scores, and bounding box coordinates.

Example JSON