1. Foundation Models & Scaling Innovations
Mixtral of Experts (8x7B)
A Sparse Mixture-of-Experts (SMoE) model that activates only 2 of 8 experts per token, achieving GPT-3.5-level performance with 47B parameters but 13B active during inference. It excels in multilingual tasks and coding benchmarks, released under Apache 2.0.
Read the paper
Llama 3 (405B Parameters)
A dense transformer model with a 128K-token context window, integrating multimodal capabilities for image, video, and speech processing. Matches GPT-4 in reasoning and tool usage.
Read the technical report
Phi-3 Series
Compact models like Phi-3-mini (3.8B parameters) rival larger models (e.g., Mixtral 8x7B) in benchmarks, deployable on smartphones. The Phi-3.5-Vision variant adds multimodal reasoning.
Read the technical report
DeepSeek-V3 (671B MoE)
A cost-efficient MoE model using 37B active parameters per token. Introduces load-balancing strategies and multi-token prediction, outperforming open-source rivals.
Read the technical report
2. Multimodal & Generative AI
Genie: Generative Interactive Environments
Trained on 200K hours of gameplay, this 11B-parameter model from Google DeepMind generates action-controllable virtual worlds from text or images.
Read the paper
AlphaFold 3
A diffusion-based model predicting biomolecular structures (proteins, DNA, ligands) with unmatched accuracy, reducing reliance on multiple-sequence alignments.
Read the paper
Sora: Text-to-Video World Simulation
OpenAI’s video generation model simulates dynamic environments, though challenges like object hallucinations persist. Analyzed as a step toward AGI.
Read the analysis
CogView3
A relay diffusion framework for text-to-image generation, producing HD images faster than SDXL while maintaining quality.
Read the paper
3. Efficiency & Optimization
DoRA: Weight-Decomposed Low-Rank Adaptation
Enhances LoRA by separating weight magnitude and direction, improving fine-tuning efficiency and robustness.
Read the paper
Byte Latent transformer (BLT)
Replaces tokenization with byte-level patches, scaling to 8B parameters and 4T training bytes for better long-tail generalization.
Read the paper
QLoRA
Efficiently fine-tunes 65B models on a single GPU via 4-bit quantization, enabling accessible LLM customization.
Read the paper
4. AI Alignment & Safety
DPO vs. PPO for LLM Alignment
PPO outperforms DPO on out-of-distribution data, but DPO’s simplicity makes it widely adopted (e.g., Llama 3).
Read the study
RLAIF vs. RLHF
Explores scaling alignment using AI feedback instead of human input, reducing costs while maintaining safety.
Read the survey
5. Applications in Science & Industry
AlphaDev
Uses LLMs to discover faster sorting algorithms via deep reinforcement learning, optimizing low-level code.
Read the paper
ChemCrow
Augments LLMs with chemistry tools for drug discovery and molecular design, bridging AI and experimental science.
Read the paper
Mora: Multi-Agent Video Generation
Replicates Sora’s capabilities using collaborative AI agents for tasks like video editing and simulation.
Read the paper
6. Surveys & Future Directions
Agent AI: A 80-Page Survey on Multimodal Interaction
Explores embodied AI agents that perceive environments and act autonomously, reducing hallucinations.
Read the survey
World Models in AGI
Reviews how models like Sora simulate environments, predicting future states for robotics and autonomous systems.
Read the survey
Large Language Model Evaluation
A 111-page review assessing LLMs on knowledge, alignment, and safety, emphasizing responsible deployment.
Read the survey
Full Repository Access
For the complete list of 100 papers, explore:
- Awesome Generative AI Guide
- Sebastian Raschka’s 2024 LLM Research List Part 1 | Part 2
- AI Surveys on Detection, MoE, and Multimodal Fusion .
This article synthesizes advancements that redefine AI’s role in science, ethics, and industry, offering a roadmap for researchers and practitioners alike.