Just when you thought GenAI was hype: FLUX, Gemma 2, Stable Fast 3D, SAM 2, GR00T and Midjourney

2024-08-11

The past few weeks have been full of groundbreaking advancements in robotics and Generative AI, with major players like Meta, Nvidia, OpenAI, Stability AI, Google, Figure, and Midjourney, all pushing the envelope!

Although I often publish “how-to” guides for solo entrepreneurs who want to build AI and Generative AI powered products that people love, I really can’t pass up some of this news as they are absolutely mind blowing!

But before we start with a tour of mind-blowing updates, let me add:

If you want to build a full-stack GenAI SaaS Products that people love — don’t miss out on our upcoming cohort-based course. Together, we’ll build, ship, and scale your GenAI product alongside a community of like-minded people!

Back to the topic, here’s a rundown of the most exciting developments and why they matter.

Figure 02: The Future of Humanoid Robotics

Figure has given us a sneak peek into the future with the launch of a teaser trailer for Figure 02.

After a year of relentless development, this humanoid robot is being touted as the most advanced of its kind. While details are still sparse, we can expect more information in the coming days.

This signifies a major leap forward in robotics, potentially bringing us closer to a world where humanoid robots become part of our daily lives.

OpenAI’s Advanced Voice Mode

OpenAI has begun rolling out its ‘Advanced Voice Mode’ for ChatGPT, bringing natural, real-time conversational AI to a select group of users.

This mode not only allows for seamless conversation but also has the capability to detect and respond to emotional cues.

This feature, slated for a broader release in the fall, could redefine how we interact with AI, making conversations more intuitive and emotionally intelligent, which is what we have all been longing for!

Can we combat loneliness with AI wearables?

In a move that highlights the growing intersection of AI and mental health, Friend is a wearable AI designed to alleviate loneliness.

Unlike other wearables focused on productivity, Friend emphasizes emotional companionship.

This approach could be a game-changer in how technology supports mental well-being, offering a more personalized and empathetic user experience.

FLUX is killing Midjourney?

Black Forest Labs has launched FLUX.1, an open-sourced suite of AI image generation models that users can run locally.

FLUX.1 models are based on hybrid multimodal transformer blocks:

Parallel diffusion and parallel attention layers
Scaled to 12B parameters
Flow matching

These models democratize access to cutting-edge generative AI, and push the limits of text-to-image synthesis, providing a powerful tool for creators while fostering public trust in the safety and transparency of AI technologies.

Look at the benchmarks, absolutely crazy!

Stable Fast 3D is the future of 3D asset creation

Stability AI is revolutionizing the gaming and design industries with Stable Fast 3D, a model capable of generating 3D assets from a single image in just 0.5 seconds.

This powerhouse model, built on the rock-solid foundation of TripoSR, comes with major architectural upgrades that take its capabilities to the next level.

Whether you’re a game developer, virtual reality enthusiast, or a professional in retail, architecture, or design, this model has something for you. It’s perfect for anyone who needs top-notch 3D graphics quickly and efficiently.

What’s even better? You can find it on Hugging Face, released under the Stability AI Community License. It’s also available via the Stability AI API and the Stable Assistant chatbot, where you can not only generate but also share your 3D creations in a viewer or even play around with them in Augmented Reality.

Gemma 2 is responsible AI with a performance edge

Google recently introduced Gemma 2, available in 9B and 27B parameter versions, bringing even greater performance and efficiency.

What’s remarkable is that the 27B Gemma 2 model rivals models twice its size! The 9B model also outperforms competitors in its class, including Llama 3 8B!

Gemma 2 is optimized for cost-effective deployment, running efficiently on a single NVIDIA H100 Tensor Core GPU or Google Cloud TPU host, making high-performance AI more accessible.

Whether on cloud, desktop, or even gaming laptops, Gemma 2 runs at impressive speeds. It’s available for testing on Google AI Studio, and can also be deployed locally via Hugging Face Transformers or Gemma.cpp for CPU optimization.

Gemma 2 isn’t just an upgrade — it sets a new standard for open AI, combining performance, efficiency, and accessibility like never before. If you already test it yourself, let me know what do you think about its performance!

Stay in the loop

SAM 2 is Meta’s next-gen object segmentation

Meta continues to lead in the field of object segmentation with the release of SAM 2, an AI model capable of tracking and identifying objects across video frames in real-time.

By open-sourcing this model and the extensive dataset used for its training, Meta is empowering the community to explore new creative and practical applications, from video editing to more efficient computer vision systems.

Meta has published a wealth of resources that could easily fill an entire weekend, so be sure to set aside some quality time to dive into them!

NVIDIA’s Project GR00T is scaling robot data collection

NVIDIA’s Project GR00T is exploring new ways to scale robot data collection using the Apple Vision Pro.

By overcoming the physical and operational limitations of humanoid teleoperation, this project could pave the way for more efficient and scalable robotic systems, unlocking new possibilities in automation and data-driven robotics.

Midjourney V6.1: Elevating AI Image Generation

Midjourney has rolled out an update to its AI image generator, V6.1, bringing significant improvements in image quality, coherence, and text rendering.

The update also introduces new upscaling and personalization models, enhancing the tool’s capabilities and making it a more powerful resource for artists and designers looking to push the boundaries of visual creativity.

Wrap Up

The future is literally unfolding before our eyes, so many great products will be built and change the way we live our lives.

Are you building the future? Drop a comment.

Say Hello to ‘Her’: Real-Time AI Voice Agents with 500ms Latency, Now Open Source

Fine-Tune Meta’s Latest AI Model: Customize Llama 3.1 5x Faster with 80% Less Memory

Fine Tuning FLUX: Personalize AI Image Models on Minimal Data for Custom Look and Feel

Data Management with Drizzle ORM, Supabase and Next.js for Web & Mobile Applications

Thank you for stopping by, and being an integral part of our community.

Happy building!

Back to All Posts