DALL-E (Zero-Shot Text-to-Image Generation) -PART(2/2)

Picture from Zero-Shot Text-to-Image Generation
Picture from Zero-Shot Text-to-Image Generation paper
Picture from Zero-Shot Text-to-Image Generation paper
Picture from Zero-Shot Text-to-Image Generation paper
  1. With respect to text tokens — a unique token is learned for padding at every position. Other option was to do something like -inf for such positions.
  2. There has been a lot of details regrading how to do mixed precision training using distributed strategy ( That probably requires a separate blog). To put it briefly, challenge is to fit 1 billion param in V100 16 GB GPU. They follow similar strategies such as activation checkpointing and modification of gradient scaling. They have provided guidelines to avoid numerical underflow.
  3. Instead of aggregating the gradients from different GPU by taking average, they have used PowerSGD(https://arxiv.org/pdf/1905.13727.pdf). Core idea is to solve communication bottleneck. Where gradient matric M (mxn)is approximated using P(mxr)multiplied by(Q(nxr).T)
  4. Dataset : 250M image-text pairs obtained from internet. Part of this dataset also includes Conceptual caption and YFCC 100M.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Logistic Regression-Theory and Practice

Machine Learning Techniques in Reading Tracking. Meet your [STORYLOOK]

Transfer learning to generalize with DenseNet

Deep Learning Specialization Course

Starting guide to artificial intelligence part 2

Term Frequency — Inverse Document Frequency

An image displaying a few sentences

An Easy Guide to build new TensorFlow Datasets and Estimator with Keras Model

Graph ML in 2022: Where Are We Now?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rakshith V.

Rakshith V.

More from Medium

What is AI ( RL ) & Benign AI | AI — Friend / Foe ? | ft Moonfall 2022

DALL-E (Zero-Shot Text-to-Image Generation) -PART(1/2)

AI & Law: Seeking Legal And Ethical AI Via Use Of Differential Privacy For Machine Learning

Synthetic Image Generation — Mona Lisa’s Sister