Yoad Tewel

I am a Computer Science PhD candidate at Tel-Aviv University, working in the Deep Learning Lab under the supervision of Prof. Lior Wolf, and a research intern at NVIDIA Research, working with Gal Chechik and Yuval Atzmon.

I'm interested in the intersection of computer vision, NLP, and machine learning. In particular, I work on image and video captioning, weakly-supervised phrase-grounding. Much of my work focuses on the utilization of large pre-trained models.

Email  /  Github  /  Google Scholar  /  LinkedIn  /  Twitter

profile photo
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel, Yoav Shalev, Idan Schwartz, Lior Wolf
CVPR, 2022
Code / arXiv

In this work, we repurpose large pre-trained text-to-image matching models to generate a descriptive text given an image at inference time, without any further training or tuning steps.

Zero-Shot Video Captioning with Evolving Pseudo-Tokens
Yoad Tewel, Yoav Shalev, Roy Nadler, Idan Schwartz, Lior Wolf
arXiv, 2022
Code / arXiv

We introduce a zero-shot video captioning method that employs two frozen networks: the GPT-2 language model and the CLIP image-text matching model. The matching score is used to steer the language model toward generating a sentence that has a high average matching score to a subset of the video frames.

What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs
Tal Shaharabany, Yoad Tewel, Lior Wolf
NeurIPS, 2022
Code / arXiv

Given an input image, and nothing else, our method returns the bounding boxes of objects in the image and phrases that describe the objects. This is achieved within an open world paradigm, in which the objects in the input image may not have been encountered during the training of the localization mechanism. Moreover, training takes place in a weakly supervised setting, where no bounding boxes are provided.

This page design is based on a template by Jon Barron.