Компиляция нескольких постов про то, что читать про ML/NLP/LLM:
Обучающие материалы 🗒
-
https://habr.com/ru/articles/774844/-
https://lena-voita.github.io/nlp_course.html-
https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf-
https://www.youtube.com/watch?v=rmVRLeJRkl4&list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4-
https://huggingface.co/docs/transformers/perf_train_gpu_oneБлоги 🍿
-
https://huggingface.co/blog/-
https://blog.eleuther.ai/-
https://lilianweng.github.io/-
https://oobabooga.github.io/blog/-
https://kipp.ly/-
https://mlu-explain.github.io/-
https://yaofu.notion.site/Yao-Fu-s-Blog-b536c3d6912149a395931f1e871370dbПрикладные курсы 👴
-
https://github.com/yandexdataschool/nlp_course-
https://github.com/DanAnastasyev/DeepNLP-Course(Я давно не проходил вообще никакие курсы, если есть что-то новое и хорошее - пишите!)
Каналы 🚫
-
https://t.me/gonzo_ML-
https://t.me/izolenta_mebiusa-
https://t.me/tech_priestess-
https://t.me/rybolos_channel-
https://t.me/j_links-
https://t.me/lovedeathtransformers-
https://t.me/seeallochnaya-
https://t.me/doomgrad-
https://t.me/nadlskom-
https://t.me/dlinnlp(Забыл добавить вас? Напишите в личку, список составлялся по тем каналам, что я сам читаю)
Чаты 😁
-
https://t.me/betterdatacommunity-
https://t.me/natural_language_processing-
https://t.me/LLM_RNN_RWKV-
https://t.me/ldt_chatОсновные статьи 😘
-
Word2Vec: Mikolov et al., Efficient Estimation of Word Representations in Vector Space
https://arxiv.org/pdf/1301.3781.pdf-
FastText: Bojanowski et al., Enriching Word Vectors with Subword Information
https://arxiv.org/pdf/1607.04606.pdf-
Attention: Bahdanau et al., Neural Machine Translation by Jointly Learning to Align and Translate
https://arxiv.org/abs/1409.0473-
Transformers: Vaswani et al., Attention Is All You Need
https://arxiv.org/abs/1706.03762-
BERT: Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
https://arxiv.org/abs/1810.0480-
GPT-2, Radford et al., Language Models are Unsupervised Multitask Learners
https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf-
GPT-3, Brown et al, Language Models are Few-Shot Learners
https://arxiv.org/abs/2005.14165-
LaBSE, Feng et al., Language-agnostic BERT Sentence Embedding
https://arxiv.org/abs/2007.01852-
CLIP, Radford et al., Learning Transferable Visual Models From Natural Language Supervision
https://arxiv.org/abs/2103.00020-
RoPE, Su et al., RoFormer: Enhanced Transformer with Rotary Position Embedding
https://arxiv.org/abs/2104.09864-
LoRA, Hu et al., LoRA: Low-Rank Adaptation of Large Language Models
https://arxiv.org/abs/2106.09685-
InstructGPT, Ouyang et al., Training language models to follow instructions with human feedback
https://arxiv.org/abs/2203.02155-
Scaling laws, Hoffmann et al., Training Compute-Optimal Large Language Models
https://arxiv.org/abs/2203.15556-
FlashAttention, Dao et al., FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
https://arxiv.org/abs/2205.14135-
NLLB, NLLB team, No Language Left Behind: Scaling Human-Centered Machine Translation
https://arxiv.org/abs/2207.04672-
Q8, Dettmers et al., LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
https://arxiv.org/abs/2208.07339-
Self-instruct, Wang et al., Self-Instruct: Aligning Language Models with Self-Generated Instructions
https://arxiv.org/abs/2212.10560-
Alpaca, Taori et al., Alpaca: A Strong, Replicable Instruction-Following Model
https://crfm.stanford.edu/2023/03/13/alpaca.html-
LLaMA, Touvron, et al., LLaMA: Open and Efficient Foundation Language Models
https://arxiv.org/abs/2302.13971