Публикация #763 — Hi, AI!

TGStat

Введите текст для поиска

Расширенный поиск каналов

Russian

Язык сайта

Russian English Uzbek
Вход на сайт

Каталог

Каталог каналов и чатов Поиск каналов
Добавить канал/чат
Рейтинги

Рейтинг каналов Рейтинг чатов Рейтинг публикаций
Рейтинги брендов и персон
Аналитика
Поиск по публикациям
Мониторинг Telegram

Hi, AI! | media

25 Jun, 18:47

Открыть в Telegram Поделиться Пожаловаться

00:04

Видео недоступно для предпросмотра

Смотреть в Telegram

❓ When Will the Data for Training LLMs Run Out?

In the next 2 years, humanity might face the strangest shortage in history — running out of human-created texts. This will lead to language models (LLMs) depleting their training data, causing a scaling crisis. Researchers studying AI's impact on our world have come to this conclusion.

Number of the day

300 trillion tokens — the amount of text created by humanity that is currently available for training AI models.

0️⃣ "Data Drought"

2026–2032 — researchers consider this period the most likely timeframe for the complete depletion of text data for training LLMs. It could happen even sooner if models are heavily overtrained due to the AI race and the scaling of popular LLMs.

Three Main Conclusions from Researchers

1️⃣ Textual data will become the bottleneck in developing more advanced LLMs.

2️⃣ Synthetic data from AI is still insufficiently studied. They are useful in narrow fields like mathematics and programming. Some believe such data can be dangerous as AI might make mistakes when creating them.

3️⃣ Private data, such as personal messages, are unlikely to be used on a large scale due to legal issues.

🔠 Solutions to the Crisis

Researchers propose several solutions for developing LLMs:

➡️ Synthetic data.
➡️ Training on other types of data.
➡️ Increasing data efficiency.

💲 Who Can I Sell My Data to

Companies are already offering internet users monetary rewards for their data, which can be used to train AI models. Here are some of them:

➡️ TIKI — for access to users' mobile devices. They are interested in user behavior within apps partnered with TIKI.

➡️ Caden — for access to personal accounts on Netflix and Amazon. Earnings range from $5 to $50 per month.

➡️ Invisible offers access to paid news articles in exchange for demographic and behavioral data, including information on vaccinations and users' political affiliations. The company plans to trade this data for digital subscriptions costing between $4 and $15 per month.

@hiaimediaen

16.5k 0 11 7 59

Каталог

Каталог каналов и чатов Подборки каналов Поиск каналов Добавить канал/чат

Рейтинги

Рейтинг каналов Telegram Рейтинг чатов Telegram Рейтинг публикаций Рейтинги брендов и персон

API

API статистики API поиска публикаций API Callback

Наши каналы

@TGStat @TGStat_Chat @telepulse @TGStatAPI

Почитать

Наш блог Исследование Telegram 2019 Исследование Telegram 2021 Исследование Telegram 2023

Контакты

Поддержка Почта Вакансии

Всякая всячина

Пользовательское соглашение Политика конфиденциальности Публичная оферта

Наши боты

@TGStat_Bot @SearcheeBot @TGAlertsBot @tg_analytics_bot @TGStatChatBot

Язык сайта