NVIDIA GTC Said AI Data Is a River, Not a Lake – Here’s What That Means for Your Data Pipeline

Most enterprise AI teams are solving the wrong problem first. They’re optimizing storage speed for data that was never safe or ready to use.

At NVIDIA GTC 2026, Jacob Liberman, NVIDIA’s Director of Enterprise Product, put it plainly: “People talk about data as a lake, but it’s more like a river. It’s flowing, it’s changing – constantly changing. So the first time you prepare your data is not going to be the last time. You have to continuously prepare your data for AI as it changes.”

That single statement captures the most underestimated problem in enterprise AI today. Not the GPU shortage, not model selection, not infrastructure cost; it’s the data preparation problem.

Fast Storage Fixes the Wrong Thing First

The GTC session – “Accelerating the Path to Production: The Evolution of Enterprise Storage to Deliver AI-Ready Data,” covered NVIDIA’s BlueField-4 STX architecture, partnerships with IBM, Dell, and NetApp, and the shift toward storage systems built for AI inference rather than human retrieval.

All of that matters. Fast, AI-native storage is a genuine requirement.

But Liberman also laid out what has to happen before data reaches storage infrastructure: extraction, enrichment, classification, embedding, indexing, and semantic search. Each step is resource-intensive. Each step runs continuously, or it fails.

And here’s the step most enterprise AI programs treat as an afterthought: classification. Not embedding. Not indexing. Classification – understanding what the data actually is, what it contains, and whether it belongs in the pipeline at all.

The Step Most Teams Skip

When enterprises build AI data pipelines, they focus on what goes in: volume, recency, and format. Can the storage system retrieve it fast enough? Can the vector database handle the query load?

The more important question rarely comes up: should this data be in the pipeline at all?

Enterprise data estates contain a mix of everything. Sensitive customer records from three years ago sit in a shared folder. Unreleased product documents were never flagged as restricted. Clinical trial data lives alongside general research files. ITAR-controlled engineering specs share a storage bucket with public documentation.

When an AI agent, RAG pipeline, or fine-tuning dataset pulls from that estate, it doesn’t discriminate. It retrieves what it can reach. It processes what it finds.

The result is both a security problem and a data quality problem. Unclassified, ungoverned, mixed-sensitivity data produces AI outputs that can’t be trusted, audited, or explained. And as Liberman noted, inference is increasingly where the value is created; put bad data into inference and you get bad decisions out.

Why “Continuous” Is the Critical Word

The most important thing Liberman said at GTC wasn’t about storage speed. It was about time.

Data doesn’t stay still. New files get created daily. Documents are modified, copied, and moved across systems. A dataset that was clean and appropriate last month may contain sensitive records this month, because someone added a new data source to the pipeline.

Static data preparation doesn’t solve this. A classification scan conducted when the pipeline was first built goes stale within weeks. New data arrives unclassified. Sensitive content drifts into storage locations where it shouldn’t exist.

Continuous data preparation means the classification layer runs alongside the data, not just ahead of it, once. It means new files are understood before they’re retrieved. It means when a document is modified or moved, its classification updates in near real time, not on a quarterly audit cycle.

That’s exactly what NVIDIA’s storage partners are building toward on the infrastructure side. The question for enterprise AI teams is whether their data intelligence layer keeps pace.

When the Pipeline Doesn’t Know What It’s Retrieving

Traditional enterprise applications were built around specific, bounded data. A CRM system holds CRM data. An ERP holds ERP data. Classification was simple because the data was already contained.

AI agents don’t work within those boundaries. They pull from file shares, SharePoint, S3 buckets, SaaS platforms, data lakes, and internal knowledge bases, simultaneously, at scale, in real time. Each of those sources carries a different mix of data: some appropriate for AI pipelines, some sensitive or regulated, most of it never classified.

When the pipeline doesn’t know what it’s retrieving, two things happen. Inappropriate data reaches AI systems, bringing compliance and security exposure with it. And the model works with noisy, low-quality data mixed alongside high-quality data; its outputs reflect that.

The enterprises that successfully move AI from pilot to production aren’t the ones with the fastest storage. They’re the ones who know what’s in their data before it moves.

What the Data Preparation Layer Actually Needs to Do

The classification and intelligence layer sits between raw data storage and the AI pipeline, and it’s where Secuvy operates. Using self-learning AI rather than pattern matching or manual rules, Secuvy continuously discovers and classifies enterprise data across cloud, on-premises, and SaaS environments. It understands what data is, where it lives, and identifies sensitive, regulated, or inappropriate content before it enters any AI pipeline, RAG system, or LLM prompt.

Critically, it does this continuously. As new data arrives, as files are modified, as pipelines expand to new sources, the classification stays current. The first scan isn’t the last.

This delivers two outcomes simultaneously. First, protection: sensitive and regulated data is identified and filtered before it reaches AI systems. The pipeline isn’t just fast; it’s safe. Second, optimization: duplicate files, outdated records, ROT data, and low-value content are removed from the pipeline. The data that reaches the model isn’t just safe, it’s the right data, improving accuracy and reducing wasted GPU compute.

Both outcomes depend on the same foundation: knowing what your data is, continuously, before it moves.

The Production Gap Isn’t a GPU Problem

The path to production runs through data preparation. And data preparation isn’t a one-time project; it’s an ongoing function that has to keep pace with a data estate that never stops changing.

That’s the river problem Liberman described. The answer isn’t a better bucket. It’s a system that understands what’s in the water and keeps understanding it as the water flows.

Secuvy continuously discovers, classifies, and prepares enterprise data for AI pipelines, protecting what shouldn’t go in, and surfacing the high-value data that should. See how the intelligence layer works at secuvy.ai

Related Blogs

April 09, 2026

NVIDIA GTC Said AI Data Is a River, Not a Lake – Here’s What That Means for Your Data Pipeline

Most enterprise AI teams are solving the wrong problem first. They’re optimizing storage speed for data that was never safe or ready to use. At...

April 06, 2026

Anthropic Leaked Its Own AI Model – Because Even AI Companies Don’t Know What Data They’re Exposing

A company building the world’s most capable AI model left thousands of sensitive internal files in a publicly searchable data store. No sophisticated attacker was...

February 28, 2026

ChatGPT Enterprise vs Reality: Where Data Still Leaks

“HUMANS, as you know, make MISTAKES.” And that single fact is enough to unravel everything your ChatGPT Enterprise license promised to protect. OpenAI explicitly promises...

ChatGPT vs. Copilot vs. Claude: LLM Data Security

February 22, 2026

LLM Data Security: ChatGPT vs Copilot vs Claude Data Risks

If you believe ChatGPT Enterprise, Microsoft Copilot, and Claude are secure for enterprise use, consider these uncomfortable facts: ChatGPT has already suffered a bug that...

February 18, 2026

How Enterprises Lose Sensitive Data Through AI Assistants

ChatGPT Enterprise prevents OpenAI from training on your data, but it doesn’t stop sensitive data exposure, unauthorized transmission, or regulatory violations. The moment confidential or...

February 14, 2026

How Sensitive Data Leaks into ChatGPT Prompts (Real Enterprise Scenarios)

“ALERT: SENSITIVE INFORMATION IS LEAKING FROM YOUR SOURCE TO ANOTHER!” Your over-helpful bot would never say that. That’s because AI does exactly what it is...

February 10, 2026

For US Enterprises: How to Protect Data across ChatGPT Enterprise in 2026 (With Examples)

Did you know that Samsung banned ChatGPT & the use of Gen-AI company-wide in 2023? This decision was undertaken as an internal security incident where...

November 15, 2024

Best Practices for Data Classification in ISO 42001 Compliance

Using Data Classification for Effective Compliance When working toward ISO 42001 compliance, data classification is essential, particularly for organizations handling large amounts of data. Following...

November 12, 2024

Getting Started with Data Classification for ISO 42001 Compliance: A How-To Guide

Laying the Groundwork for ISO 42001 Compliance Starting the journey toward ISO 42001 compliance can seem complex, but with a strategic approach, companies can lay...

November 07, 2024

A Comprehensive Guide To Data Subject Access Request (DSARs)

A Data Subject Access Request (DSAR) is the means by which a consumer can make a written request to enterprises to access any personal data...

November 07, 2024

Vendor Risk Management: What is It, Why is It Important, and More

VRM deals with managing and considering risks commencing from any third-party vendors and suppliers of IT services and products. Vendor risk management programs are involved...

October 30, 2024

All About Data Discovery Tools -Characteristics And Evaluation

With organizations storing years of data in multiple databases, governance of sensitive data is a major cause of concern. Data sprawls are hard to manage...

October 30, 2024

Opt-in Vs. Opt-out Privacy Rights – All You Need to Know

There has been a phenomenal revolution in digital spaces in the last few years which has completely transformed the way businesses deal with advertising, marketing,...

October 30, 2024

CPRA vs CCPA: What You Need to Know About the Replacement of CCPA in 2023

In 2023, the California Privacy Rights Act (CPRA) will supersede the California Consumer Privacy Act (CCPA), bringing with it a number of changes that businesses...

October 09, 2024

Mastering EU AI Act Compliance Through AI-Driven Data Classification Methods

For years, tech companies have developed AI systems with minimal oversight. While artificial intelligence itself isn’t inherently harmful, the lack of clarity around how these...

September 25, 2024

Understanding AI Compliance: Key Insights for Businesses

Navigating the Shift in AI Compliance Regulations The latest revisions in the Justice Department’s corporate compliance guidelines signal a significant shift for companies that rely...

September 18, 2024

Role of Data in Ensuring Data Security

Introduction The threat landscape around data security evolves each year due to factors like a lack of robust security measures, improper data handling, and increasingly...

August 09, 2024

New GDPR Report Highlights Rising Enforcement and Data Protection Hurdles

On July 25, 2024, the European Commission released its Second Report on the Application of the General Data Protection Regulation (GDPR), offering an in-depth look...

August 06, 2024

Unlocking the Potential of AI with Data Governance, Security, and Compliance

In today’s fast-paced technological landscape, the intersection of AI, data security, and compliance has become a focal point for enterprises aiming to leverage AI’s capabilities...

July 16, 2024

Ensure Data Privacy and Fair Decision-making with AI Data Governance

Today Artificial Intelligence (AI) is a part of our day-to-day activities, and knowingly or unknowingly, it impacts our actions and decision-making. With the growing use...

1 2 3 … 6 ... Next Page

Prepare for Assessments and Get AI-Ready

Gain visibility into sensitive data, reduce exposure, and produce evidence you can trust without months of deployment or manual effort.

NVIDIA GTC Said AI Data Is a River, Not a Lake – Here’s What That Means for Your Data Pipeline

Fast Storage Fixes the Wrong Thing First

The Step Most Teams Skip

Why “Continuous” Is the Critical Word

When the Pipeline Doesn’t Know What It’s Retrieving

What the Data Preparation Layer Actually Needs to Do

The Production Gap Isn’t a GPU Problem

Related Blogs

Prepare for Assessments and Get AI-Ready

The Era of AI Data Governance is here

Products

Solutions

Company

Company

Mail