Significant GPU memory savings enabling ~2x increase in number of streams processable on certain devices
New
-
This update introduces a new deployment architecture that results in significant GPU memory savings per deployment. This enables you to run close to 2x the previous number of deployments for certain models on the devices that were previously memory constrained, with no change for devices/models that are GPU compute constrained:
Device Type Light Models Medium Models Heavy Models Jetson Orin Nano 8GB Upto 2x more deployments Similar capacity as v1.38.26 Similar capacity as v1.38.26 Jetson Orin NX 8GB Upto 1.5x more deployments Similar capacity as v1.38.26 Similar capacity as v1.38.26 Jetson Orin NX 16GB Upto 1.5x more deployments Similar capacity as v1.38.26 Similar capacity as v1.38.26 Jetson Orin AGX 32/64GB Similar capacity as v1.38.26 Similar capacity as v1.38.26 Similar capacity as v1.38.26 Nvidia DGPU systems Upto 2x more deployments Upto 2x more deployments Upto 2x more deployments
Updated
- Fix typo in logrotate conf which lead to some logs growing infinitely
- System metrics reporting (CPU usage and such) changed from 15 mins to 1 minute
- Fix for inbound webhooks not being passed along to Receive HTTP Webhook node.