MaxCLIP Competition

Welcome to the MaxCLIP competition page! Here you can find all the details about our latest competition, including rules and deadlines.

Motivation

Training state-of-the-art CLIP models requires large amount of resources, especially a large number of GPUs due to its large batch-size requirement. However, such a large amount of resources is not accessible to most researchers. In order to accelerate research in this area, we host this competition in search for resource-efficient algorithms that train good CLIP models.

Task Definition

In this competition, participants will design an algorithm to efficiently train CLIP models in a limited-resource setting. The designed algorithm needs to be implemented and will be run on training datasets of different sizes on a small number of GPUs (e.g., 8GPUs). The algorithms will then be ranked according to the evaluation performance of the trained models. In order to reduce the cost of participation, the participants only need to design and implement their algorithms, and we will provide resources to run their algorithms.

Training Setting

Each submission will be run in three different settings, which differ in the size of the training data. Based on the training data size, we name the three settings as small (1 million training data), medium (10 million) and large (100 million). Dataset of smaller scale is a subset of that of larger scales, which are all subsets of the DFN-2B dataset. Other components of training, including number of samples seen, batch size, etc., will be fixed across different settings. We provide more detail of different training settings in the table below.

Scale	Dataset Size	Samples Seen	Model	Batch Size/GPU	GPUs
Small	1 million	1 billion	ViT-B/32	4096	8x H100
Medium	10 million
Large	100 million

Evaluation Metric

The performance of the trained models will be evaluated using the DataComp benchmark. We also keep track of ImageNet-1K Top 1 zero-shot accuracy. We will evaluate submissions as soon as we can and release leaderboard updates on a weekly basis.

Baseline, Code, and Resources

A starter kit with example training code will be provided, which implements the FastCLIP-v3 algorithm. Baseline results of FastCLIP-v3 on all the settings are provided as a reference. We will provide all the GPU resources to train and evaluate each submission. We will release all the training data so that participants are able to tune the hyperparameters of their algorithms.

Leaderboards

For all submissions, we will run them under different settings and evaluate the trained models. The submissions are then ranked according the average performance under different settings and the results will be published in a public leaderboard. We also setup an additional (unconstrained) leaderboard to keep track of state-of-the-art CLIP model performance in different settings. This leaderboard publishes the performance of CLIP models trained using any approach on any dataset. These should be from a paper (preprint or postprint). Participants need to follow the same evaluation process on the Datacomp benchmark or submit a CLIP model which will be evaluated through the same process. The results will be categorized according to model architecture and ranked.

Submission Rules

Eligibility: The competition is open to individuals and teams from all backgrounds, including university students, researchers, and industry professionals.
Each team is limited to one submission every two weeks.
Submissions must include:
- The modified code in the provided editable folder.
- A configuration file with hyperparameters.
Submissions must be a single ZIP file.
Submissions must not include any modifications to frozen components (e.g., dataset, model architecture, etc.) unless explicitly permitted.
In case of a technical failure (e.g., bug), the competitor will be notified and allowed to submit a fix within 48 hours.
We welcome submission of papers/GitHub repositories from newly-released.

Permitted Modifications

In the constrained track, the following modifications to the codebase are permitted:

Loss computation.
Model update.

Prohibited Actions

Participants may not use any pre-trained models in their submission.
Tampering with model architecture or any other fixed code (for a constrained submission).
Tampering with or bypassing the competition's fixed evaluation or training procedure is strictly prohibited and will result in automatic disqualification.

A folder with code that competitors are allowed to edit will be provided; everything else (except a configuration file containing hyperparameters) is frozen.