AlexeyPrikhodko Thanks for the details.
Regarding 1. Been able to obtain predictions from U-net like models directly from Labeling Interfaces sounds like a good idea. A user might localize an area first (rectangle) and get predictions in real-time. That might be especially useful if an underlying semantic segmentation model produces multi-class output (as opposed to Smart Tool that outputs binary mask). We will add the corresponding idea and hopefully implement it soon.
Regarding 2. Indeed, it's a very similar scenario, but for the classification task. I have a question here. Suppose that in the Labeling Interface, we have localized an area and obtained predictions - say top 5 tags. These 5 tags should be attached to the entire frame (image) or to the localized area (rectangle)? If we attach tags to the rectangle, we might want to use a larger crop as an input to a classification model. In other words, I can imagine two scenarios here (1) we already have rectangles around objects of interest and we apply a classification model on top. In this case, we can attach top 5 tags to a rectangle (2) in this scenario, we are trying to solve 2 tasks simultaneously, namely, to put a box and to obtain top 5 tags. Probably the box should be tight, but for a classification model a larger box might be preferable. The difference between boxes might be controlled via a parameter. Anyway, the idea is very interesting and we will think more about an exact formulation. If you have some thoughts on that, please share them.