delivery

AI Meets the Real World: Geospatial Models, IBM Z17, and Delivery Robots

April 14, 20256 min read

Rapid Advancements Across Video, Models, and Enterprise Integration

The past few weeks have witnessed a whirlwind of developments in artificial intelligence, with breakthroughs spanning generative video, model improvements, hardware innovations, and deeper software integration. Let's explore these transformative changes reshaping our digital landscape.

Generative AI for Video Takes a Leap Forward

Amazon's Nova Real has made significant strides in AI video generation, doubling video length capabilities to two minutes. This seemingly modest increase actually represents a breakthrough, enabling users to create content with a complete narrative arc—a beginning, middle, and end—particularly valuable for small businesses needing quick product demonstrations with minimal production overhead.

Nova Real 1.1 introduces multi-shot functionality, addressing the critical challenge of maintaining visual consistency across different scenes generated from the same prompt. The feature allows users to employ a single comprehensive prompt of up to 4,000 characters to create approximately 20 six-second clips that maintain visual coherence throughout the full two-minute duration. For those requiring even more precision, the multi-shot manual mode offers finer control using reference images and shorter prompts—ideal for maintaining brand consistency.

Meanwhile, Google has unveiled Google Vids, powered by their VO2 model, democratizing video creation by allowing users to generate original clips directly within the application. For more advanced users, VO2 on Google's Vertex AI platform offers sophisticated editing capabilities including in-painting (removing objects and having AI fill in the background), out-painting (extending the frame), and advanced camera controls like pans, time-lapses, and simulated drone shots.

These tools are already demonstrating real business impact. L'Oreal is using them to accelerate global marketing content creation, while Kraft Heinz's Tastemaker platform has reportedly reduced content creation time from eight weeks to just eight hours—a dramatic efficiency improvement that translates to significant cost savings and faster market responsiveness.

Google's Expanding AI Ecosystem

Google continues evolving its Gemini platform with the experimental Gemini 2.5 Pro Deep Research, enhancing the system's ability to synthesize and analyze information, now integrated directly into Google Workspace. This focus on desktop-based research capabilities suggests Google is prioritizing deeper analysis tools within workflows people already use daily.

Google's Vertex AI platform stands out for its comprehensive approach, offering generative models across multiple modalities: video (VO2), image (Imagen 3), speech (Chirp 3), and music (Lyria in preview). This multi-modal integration streamlines multimedia content creation, allowing users to generate videos, add AI voiceovers, and create background music all within one ecosystem.

The speech component, Chirp 3, now features instant custom voice capabilities, creating voice clones from just 10 seconds of audio—opening possibilities for personalization, accessibility, and branded voice experiences. It also offers improved transcription with diarization, separating different speakers in recordings, and works across over 35 languages.

Safety remains a priority in Google's approach, with digital watermarking (SynthID), safety filters, and data governance built into their offerings. Perhaps most notably, Google is providing indemnification for businesses using these tools, effectively backing users if they face IP-related legal challenges—a significant trust-building measure.

Specialized AI Applications Emerge

One fascinating development is Google's work on geospatial foundation models, applying generative AI to location-based data like satellite imagery and aerial photos. These models can understand natural language prompts to analyze complex spatial data—for example, identifying houses with solar panels in a specific area or mapping flood damage in a region. Organizations like WTP, Airbus, and MaxAR are already testing these capabilities, with pilot programs integrating Gemini within Google Earth for creating data layers and discovering insights.

Hardware Advances Power AI Growth

The AI revolution depends on hardware innovation, and recent announcements highlight this critical relationship. IBM's new Z17 mainframe delivers 7.5 times the AI performance of its predecessor, thanks to the new Telum 216 processor and dedicated AI accelerator card (SPIRE). This advancement enables mainframes to handle LLMs and agents alongside high-volume transactions—proving mainframes aren't obsolete but evolving to support AI-enhanced business operations.

Google's hardware advancements are equally impressive with their seventh-generation TPU, Ironwood, specifically designed for AI inference. The scale is immense: configurations can link up to 9,216 chips, delivering 42.5 exaflops of compute power—exceeding the capabilities of today's largest supercomputers. Available in different configurations on Google Cloud, these systems provide flexible options for organizations with varying needs.

AI Integration in Everyday Tools

AI is increasingly becoming embedded in the software tools we use daily. Google Workspace now reports over 2 billion AI assists monthly, demonstrating rapid adoption and normalization of AI in everyday workflows. New features include audio generation in Google Docs (creating podcast-style summaries), "Help Me Refine" writing assistance that goes beyond grammar to improve argument strength and clarity, and "Help Me Analyze" in Sheets, which provides on-demand data analysis and visualization.

Even Reddit is embracing AI, partnering with Google Cloud to implement the "Reddit Answers" chatbot powered by Gemini on Vertex AI. The company reports improved search relevance and increased traffic to its homepage through this integration.

Microsoft continues enhancing Copilot with mobile vision capabilities, turning smartphone cameras into interactive visual search tools that can identify plants, translate signs, or help troubleshoot problems in real-time. Currently available to Copilot Pro subscribers in the US, this feature highlights how AI is extending into mobile-first experiences.

Canva has significantly expanded its AI capabilities with image generation, interactive coding via Canva Code (in partnership with Anthropic), and AI-powered spreadsheets featuring "Magic Insights" and "Magic Charts." With integrations for HubSpot, Statista, and Google Analytics, Canva is positioning itself as an all-in-one creative platform enhanced by AI.

The democratization of web development continues with WordPress.com's AI website builder, which transforms prompts into complete websites that users can then refine manually—dramatically lowering barriers to web presence.

Creative Industries and AI

Even noted AI skeptic James Cameron has shifted his perspective somewhat, suggesting AI could be essential for reducing visual effects costs—potentially cutting them in half for visually intensive productions like "Dune" or his own "Avatar" franchise. While still skeptical about AI-written scripts, Cameron has joined the board of Stability AI, signaling his recognition of AI's growing role in creative production.

Google's collaboration with The Sphere in Las Vegas demonstrates AI's transformative potential for visual media. Using fine-tuned Gemini, VO2, and Imagen 3, they enhanced "The Wizard of Oz" for The Sphere's enormous screen by increasing resolution, extending backgrounds, and digitally recreating characters—AI reportedly touched over 90% of the film for this project.

Looking Ahead: Agent Collaboration and Physical Understanding

DeepMind CEO Demis Hassabis has hinted at plans to combine Gemini with VO2, allowing the language model to learn from video data (potentially leveraging YouTube's vast library) to improve its understanding of the physical world. This integration could give AI a more intuitive grasp of reality by combining language processing with visual understanding.

The emergence of agent-to-agent protocols represents another frontier. Google has announced the A2A (Agent-to-Agent) protocol as a standard for AI agent collaboration, focusing on semantic interoperability—ensuring agents understand each other's meaning and intent rather than simply exchanging data. Similarly, the MCP protocol is emerging as an open standard, with Google planning to support it for Gemini models and their SDK.

Real-world agent deployment is already happening, as demonstrated by DoorDash's expanding program for sidewalk robot delivery, in partnership with Coco in Los Angeles and Chicago. This multi-modal approach—combining human drivers, robots, and potentially drones—shows how autonomous agents are moving beyond experimental phases into practical deployment.

Conclusion

The pace of AI innovation is relentless, with developments across generative video, model improvements, specialized hardware, software integration, creative applications, and autonomous agents. What was once future technology is rapidly becoming present reality, reshaping industries and creating new possibilities daily. Staying informed and thinking critically about these developments has never been more important as AI continues its integration into every aspect of our digital lives.

Back to Blog