Interactive Image Segmentation

2023-11-02

FocalClick: Towards Practical Interactive Image Segmentation

阿里巴巴

CVPR2022
Interactive segmentation allows users to extract target masks by making positive/negative clicks. Although explored by many previous works, there is still a gap between academic approaches and industrial needs: first, existing models are not efficient enough to work on low power devices; second, they perform poorly when used to refine preexisting masks as they could not avoid destroying the correct part. FocalClick solves both issues at once by predicting and updating the mask in localized areas. For higher efficiency, we decompose the slow prediction on the entire image into two fast inferences on small crops: a coarse segmentation on the Target Crop, and a local refinement on the Focus Crop. To make the model work with preexisting masks, we formulate a sub-task termed Interactive Mask Correction, and propose Progressive Merge as the solution. Progressive Merge exploits morphological information to decide where to preserve and where to update, enabling users to refine any preexisting mask effectively. FocalClick achieves competitive results against SOTA methods with significantly smaller FLOPs. It also shows significant superiority when making corrections on preexisting masks. Code and data will be released at github.com/XavierCHEN34/ClickSEG
在这里插入图片描述

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

ICCV 2023
University of North Carolina at Chapel Hill
Click-based interactive image segmentation aims at extracting objects with a limited user clicking. A hierarchical backbone is the de-facto architecture for current methods. Recently, the plain, non-hierarchical Vision Transformer (ViT) has emerged as a competitive backbone for dense prediction tasks. This design allows the original ViT to be a foundation model that can be finetuned for downstream tasks without redesigning a hierarchical backbone for pretraining. Although this design is simple and has been proven effective, it has not yet been explored for interactive image segmentation. To fill this gap, we propose SimpleClick, the first interactive segmentation method that leverages a plain backbone. Based on the plain backbone, we introduce a symmetric patch embedding layer that encodes clicks into the backbone with minor modifications to the backbone itself. With the plain backbone pretrained as a masked autoencoder (MAE), SimpleClick achieves state-of-the-art performance. Remarkably, our method achieves 4.15 NoC@90 on SBD, improving 21.8% over the previous best result. Extensive evaluation on medical images demonstrates the generalizability of our method. We further develop an extremely tiny ViT backbone for SimpleClick and provide a detailed computational analysis, highlighting its suitability as a practical annotation tool.
在这里插入图片描述

Interactive Segmentation as Gaussian Process Classification

CVPR2023
西安交通大学
Click-based interactive segmentation (IS) aims to extract the target objects under user interaction. For this task, most of the current deep learning (DL)-based methods mainly follow the general pipelines of semantic segmentation. Albeit achieving promising performance, they do not fully and explicitly utilize and propagate the click information, inevitably leading to unsatisfactory segmentation results, even at clicked points. Against this issue, in this paper, we propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image. To solve this model, we utilize amortized variational inference to approximate the intractable GP posterior in a data-driven manner and then decouple the approximated GP posterior into double space forms for efficient sampling with linear complexity. Then, we correspondingly construct a GP classification framework, named GPCIS, which is integrated with the deep kernel learning mechanism for more flexibility. The main specificities of the proposed GPCIS lie in: 1) Under the explicit guidance of the derived GP posterior, the information contained in clicks can be finely propagated to the entire image and then boost the segmentation; 2) The accuracy of predictions at clicks has good theoretical support. These merits of GPCIS as well as its good generality and high efficiency are substantiated by comprehensive experiments on several benchmarks, as compared with representative methods both quantitatively and qualitatively.
在这里插入图片描述

在这里插入图片描述

FocalClick: Towards Practical Interactive Image Segmentation

CVPR2022 阿里巴巴
Interactive segmentation allows users to extract target masks by making positive/negative clicks. Although explored by many previous works, there is still a gap between academic approaches and industrial needs: first, existing models are not efficient enough to work on low power devices; second, they perform poorly when used to refine preexisting masks as they could not avoid destroying the correct part. FocalClick solves both issues at once by predicting and updating the mask in localized areas. For higher efficiency, we decompose the slow prediction on the entire image into two fast inferences on small crops: a coarse segmentation on the Target Crop, and a local refinement on the Focus Crop. To make the model work with preexisting masks, we formulate a sub-task termed Interactive Mask Correction, and propose Progressive Merge as the solution. Progressive Merge exploits morphological information to decide where to preserve and where to update, enabling users to refine any preexisting mask effectively. FocalClick achieves competitive results against SOTA methods with significantly smaller FLOPs. It also shows significant superiority when making corrections on preexisting masks. Code and data will be released at github. com/XavierCHEN34/ClickSEG
在这里插入图片描述
在这里插入图片描述

S A M M e d SAM^{Med} SAMMed : A medical image annotation framework based on large vision model

华东师范大学
https://arxiv.org/pdf/2307.05617.pdf
Recently, large vision model, Segment Anything Model (SAM), has revolutionized the computer vision field, especially for image segmentation. SAM presented a new promptable segmentation paradigm that exhibit its remarkable zero-shot generalization ability. An extensive researches have explore the potential and limits of SAM in various downstream tasks. In this study, we presents SAMMed, an enhanced framework for medical image annotation that leverages the capabilities of SAM. SAMMed framework consisted of two submodules, namely SAMassist and SAMauto. The SAMassist demonstrates the generalization ability of SAM to the downstream medical segmentation task using the prompt-learning approach. Results show a significant improvement in segmentation accuracy with only approximately 5 input points. The SAMauto model aims to accelerate the annotation process by automatically generating input prompts. The proposed SAP-Net model achieves superior segmentation performance with only five annotated slices, achieving an average Dice coefficient of 0.80 and 0.82 for kidney and liver segmentation, respectively. Overall, SAMMed demonstrates promising results in medical image annotation. These findings highlight the potential of leveraging large-scale vision models in medical image annotation tasks.
在这里插入图片描述

AdaptiveClick: Clicks-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation

https://arxiv.org/abs/2305.04276
湖南大学
Interactive Image Segmentation (IIS) has emerged as a promising technique for decreasing annotation time. Substantial progress has been made in pre- and post-processing for IIS, but the critical issue of interaction ambiguity notably hindering segmentation quality, has been under-researched. To address this, we introduce AdaptiveClick – a clicks-aware transformer incorporating an adaptive focal loss, which tackles annotation inconsistencies with tools for mask- and pixel-level ambiguity resolution. To the best of our knowledge, AdaptiveClick is the first transformer-based, mask-adaptive segmentation framework for IIS. The key ingredient of our method is the Clicks-aware Mask-adaptive Transformer Decoder (CAMD), which enhances the interaction between clicks and image features. Additionally, AdaptiveClick enables pixel-adaptive differentiation of hard and easy samples in the decision space, independent of their varying distributions. This is primarily achieved by optimizing a generalized Adaptive Focal Loss (AFL) with a theoretical guarantee, where two adaptive coefficients control the ratio of gradient values for hard and easy pixels. Our analysis reveals that the commonly used Focal and BCE losses can be considered special cases of the proposed AFL loss. With a plain ViT backbone, extensive experimental results on nine datasets demonstrate the superiority of AdaptiveClick compared to state-of-the-art methods. Code will be publicly available at this https URL.

在这里插入图片描述

Focused and Collaborative Feedback Integration for Interactive Image Segmentation

CVPR 2023
清华
Interactive image segmentation aims at obtaining a segmentation mask for an image using simple user annotations. During each round of interaction, the segmentation result from the previous round serves as feedback to guide the user’s annotation and provides dense prior information for the segmentation model, effectively acting as a bridge between interactions. Existing methods overlook the importance of feedback or simply concatenate it with the original input, leading to underutilization of feedback and an increase in the number of required annotations. To address this, we propose an approach called Focused and Collaborative Feedback Integration (FCFI) to fully exploit the feedback for click-based interactive image segmentation. FCFI first focuses on a local area around the new click and corrects the feedback based on the similarities of high-level features. It then alternately and collaboratively updates the feedback and deep features to integrate the feedback into the features. The efficacy and efficiency of FCFI were validated on four benchmarks, namely GrabCut, Berkeley, SBD, and DAVIS. Experimental results show that FCFI achieved new state-of-the-art performance with less computational overhead than previous methods. The source code is available at https://github.com/veizgyauzgyauz/FCFI.在这里插入图片描述

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Interactive Image Segmentation 的相关文章

  • Windows+Nvdia显卡配置Tensorflow

    这篇其实算是自己的笔记 因为配置Tensorflow的过程有点多又记不住 免的每次都搜了 索性记录下来 如标题所示为了完成配置工作首先得是windows 其次得有Nvdia显卡 接着就可以开工了 哦 对 python也得有吧 我一直用的3
  • Elasticsearch专栏-2.es环境安装

    es部署安装 安装说明 单机部署 解压安装 添加es专用用户 修改配置文件 修改系统配置 关闭防火墙 或放行9200端口 切换用户 后台启动 集群部署 集群部署说明 在一台机器上 在三台机器上 kibana部署 解压安装 修改配置文件 切换
  • Python绘制时序图、自相关图和偏自相关图。

    import pandas as pd import matplotlib pyplot as plt import seaborn as sns import statsmodels api as sm sns set theme 设置风
  • Vue学习第三天(axios和动画效果)

    Vue学习第三天 axios和动画效果 在学习视频中 老师演示使用vue resource 但是由于vue resource已经停止升级 我们使用axios完成ajax操作 官方文档的一堆介绍可以自行百度 下面我们介绍它的简单使用 使用方法

随机推荐

  • 程序员们,千万不要接私活!

    点击上方 程序员小灰 选择 置顶公众号 有趣有内涵的文章第一时间送达 本文转载自公众号 前端你别闹 这个话题很纠结 现在社会 有很多人都在利用个人时间兼职赚钱 程序员俗称 接私活 其他行业称作兼职 比如下了班出去跑滴滴 周末兼职抢单送外卖等
  • Hive窗口函数全解

    在SQL中有一类函数叫做聚合函数 例如sum avg max 等等 这类函数可以将多行数据按照规则聚集为一行 一般来讲聚集后的行数是要少于聚集前的行数的 但是有时我们想要既显示聚集前的数据 又要显示聚集后的数据 这时我们便引入了窗口函数 窗
  • JdbcTemplate、JPA和MyBatis效率还是易用,你说了算

    鱼与熊掌不可得兼 你若属于我 那就只属于我一个人 我可不喜欢和别人分享 在Spring Boot中 JdbcTemplate JPA和MyBatis是三个常用的数据库层操作方式 每种方法都有其优缺点 而它们在执行效率上的差异主要取决于应用场
  • AD如何修改PCB文件的黑色编辑区

    记录一个日常小方法 不断的学习 不断的积累 AD版本也在不断的更新换代 本次主要介绍AD9和AD14 AD9修改黑色编辑区大小 首先打开PCB文档 点击design Board shape Redefine board shape 点击完之
  • 百度地图开放平台使用 JavaScript API 和 在Vue中使用

    常见使用 百度地图开放平台 JavaScriptAPI 文档 https lbs baidu com index php title jspopularGL guide helloworld 基本使用 在控制台里创建一个应用 注意应用类型填
  • NVIDIA GeForce Experience登录报错:验证程序加载失败,请检查您的浏览器设置,例如广告拦截程序(解决办法)

    NVIDIA GeForce Experience登录报错 验证程序加载失败 请检查您的浏览器设置 例如广告拦截程序 解决办法 解决结果 点击驱动程序进行检查跟新 解决问题办法 1 打开控制面板 选择 网络和共享中心 2 选择 更改适配器设
  • deque用法详解

    无意中发现了一个巨牛的人工智能教程 忍不住分享一下给大家 教程不仅是零基础 通俗易懂 而且非常风趣幽默 像看小说一样 觉得太牛了 所以分享给大家 点这里可以跳转到教程 deque函数 deque容器为一个给定类型的元素进行线性处理 像向量一
  • 编译安装Nginx

    安装make yum y install gcc automake autoconf libtool make 安装g yum y install gcc gcc c PCRE库 Nginx需要PCRE Perl Compatible Re
  • Dotween运动曲线与路径动画

    Dotween运动曲线与路径动画 Dotween 运动曲线 内置的运动曲线 AnimationCurve Dotween 路径动画 一 设置一个数组存放位置坐标 二 直接写出自己想要到的坐标 Dotween 运动曲线 想要理解Dotwenn
  • Spring Boot之容器功能

    目录 一 Spring 注入组件的注解 二 Configuration 1 代码演示 1 1JavaBean Monster java 1 2配置类 1 3执行代码 2 Configuration 注意事项和细节 三 Import 1 创建
  • 1380. 矩阵中的幸运数

    class Solution public vector
  • oracle聚合函数

    1 COUNT 计算元组的个数 2 COUNT DISTINCT ALL col 对一列中的值计算个数 distinct去重复 缺省时是ALL 3 SUM DISTINCT ALL lt 列名 gt 求某一列值的总和 数值型 4 AVG D
  • 知道创宇研发技能列表v3.0

    Expand Collapse 知道创宇研发技能表v3 0 2015 8 21 发布 by 知道创宇 www knownsec com 余弦 404团队 后续动态请关注微信公众号 Lazy Thought 说明 关于知道创宇 知行合一 守正
  • go语言入门详细教程

    文章目录 一 前言 1 Go语言的创始人 2 go语言的发展 3 go语言优缺点 4 使用go语言的项目 5 学习go语言可以做什么 一 前言 1 Go语言的创始人 Go 语言的创始人是 Robert Griesemer Rob Pike
  • 全球第二大成人网站,也要“自宫”了。。

    兄弟们 一直以全球第二大成人网站自居的O站 全称 OnlyFans 可能又要搞事情了 众所周知 这个O站一直都是一个有梦想的成人网站 他们的目标从来都不只是单纯的做大做强 它一直都没有放弃过 想要上市的 梦想 只不过吧 成人网站想要上市 这
  • 调试cube生成的f107+lan8720代码

    之前用的w5500 无奈芯片越来越贵了 正好手头上有100来颗lan8720a 直接将方案改了吧 以前在深圳工作时公司的网关正好用的这个方案 直接抄吧 硬件设计网口无晶振 由mcu的mco脚输出 50Mhz模式 其他都是通用连接方式 接下来
  • ubuntu设置ssh登陆

    默认请况下 ubuntu是不允许远程登陆的 因为服务没有开 可以这么理解 想要用ssh登陆的话 要在需要登陆的系统上启动服务 即 安装ssh的服务器端 sudo apt get install openssh server 然后 启动服务
  • graphviz安装及使用、决策树生成

    一 graphviz下载安装 下载网址 http www graphviz org download 选择合适版本下载 1 1 双击安装 1 2 点击下一步 1 3 点击我接受 1 4 添加至系统路径 勾选添加至当前用户的系统路径 创建桌面
  • 诛仙服务器获取角色信息失败,架设诛仙提示游戏服务器正在维护中

    架设诛仙提示游戏服务器正在维护中 内容精选 换一换 一 系统信息相关命令本节内容主要是为了方便通过远程终端维护服务器时 查看服务器上当前 系统日期和时间 磁盘空间占用情况 程序执行情况本小结学习的终端命令基本都是查询命令 通过这些命令对系统
  • Interactive Image Segmentation

    FocalClick Towards Practical Interactive Image Segmentation 阿里巴巴 CVPR2022 Interactive segmentation allows users to extra