Thursday, June 6, 2024

non gen AI - jepa

Joint Embedding Predictive Architecture (JEPA) is a non-generative architecture that aims to create machines that can learn as efficiently as humans. The goal of JEPA is to predict the representation of a signal from a corrupted or transformed version of that signal. JEPA uses embeddings, which are numerical representations of real-world objects, to help machines understand complex knowledge domains. 
There are different types of JEPA, including:
  • V-JEPA: A video-based JEPA that learns by predicting missing or masked parts of a video in an abstract representation space. V-JEPA can be pre-trained on video data to help it learn concepts about the physical world, similar to how a baby learns by observing its parents.
  • I-JEPA: An image-based JEPA that compares abstract representations of images, rather than comparing the pixels themselves. I-JEPA predicts missing information in an abstract representation space, such as predicting the representations of various target blocks (masked regions of an image) given a single context block (unmasked part of the image).
  • MC-JEPA: A JEPA for self-supervised learning of motion and content features

No comments:

Post a Comment