Artelys x CodaLab: data storage optimization for Data Science competitions

6 October 2022

— CodaLab Competitions is a public open-source platform for organizing competitions requiring automatic submission and evaluation of participants' code, usually on Data Science and optimization topics.
Since version 1.5, organizers can deport code evaluation computations to their own machines, opening the door to Big Data competitions. The project now wants to experiment with new competition protocols that respect data privacy and support Big Data competitions for industry and start-ups.
More than 1,000 competitions have been created on the new version since it went live last November and the huge volumes of competition data historized since the origin of Codalab are of the order of several dozen TB. This data comes from solution submissions by more than 25,000 participants to date and the storage of the dataset required for each competition. The success of the platform has led to the need to have a storage system adapted to the demand to avoid saturation. Rather than paying more for more storage, the idea was to work on the optimization of existing resources.
Figure 1: Map and KPI interfaces in the Artelys Crystal Super Grid platform
Artelys supported CodaLab in the maintenance and improvement of the deployed platforms. By proceeding step by step and starting with an inventory before providing solutions, it turned out that a Storage Analytics Dashboard could identify the causes of overstocking. Some of the solutions implemented take into account the search for obsolete or redundant data, limiting the number of submissions by participants and the size of the initial datasets that can be deposited by the organizers, or looking for inappropriate uses of the platform. The configuration of a distributed storage system with MinIO also allowed the automatic interfacing of different storage systems to facilitate the management of these data. The combination of these approaches effectively contributes to the smooth running of CodaLab’s servers.
The hosting of Data Science competitions is more and more requested by various actors wishing to enhance the value of their data and the continuous contribution of Artelys to the CodaLab platform allows to have an optimal vision of the current storage. This proactive approach perpetuates the amount of storage available, making the platform even more available for future competitions.

If you want to know more about the CodaLab platform, you can contact us or visit their website.

subscribe to our newsletters

The collected data will be exclusively processed by the company Artelys for the purpose of keeping you informed about the services and products marketed by our company.

🛈

© ARTELYS • All rights reserved • Legal mentions

Pin It on Pinterest

Share This