Quick Answer
Batch processing prevents system crashes on large datasets. The problem: Loading 50,000 records at once = 50K × 5MB = 250GB RAM = crash. The solution: Process in batches of 1,000-5,000 records, clear memory after each batch (self.env.clear()), repeat until done = 5MB RAM = success. Key patterns: (1) CSV Imports: Use _batch_generator() with try/except per batch, create(vals_list) for batch inserts. (2) Reports: Use read_group() for aggregations instead of loading all records. (3) Heavy Operations: Use queue_job module for background processing (with_delay()). (4) Memory Management: Call self.env.clear() and del records after processing. Optimal batch sizes: Simple records = 1,000-2,000, complex records = 50-500. Wrong approach = loading all records = catastrophic crash. Right approach = chunking + memory clearing = smooth completion. Prevents $40k-$100k in data recovery costs.
The Batch Processing Problem
Your D2C needs to import 50,000 orders from an old system. Staff clicks "Import." Odoo tries to load all 50,000 records into memory. 30 seconds later:
MemoryError: Unable to allocate 8 GB for the database system
Worker killed
Odoo crashes
Import fails
Data corrupted
The problem: Loading 50,000 records at once = 50,000 × 5MB per record = 250GB RAM needed.
With Batch Processing
- ✓ Process 1,000 records
- ✓ Save to database
- ✓ Clear memory
- ✓ Process next 1,000
- ✓ ...repeat 50 times
- Memory used: 5MB only
Time: 2 minutes
Success rate: 100%
The difference: Catastrophic crash vs. smooth completion. Zero code changes needed.
We've implemented 150+ Odoo systems. The ones where developers understand batch processing? They handle 100K+ record imports, 1M+ row reports, bulk operations without breaking a sweat. The ones that don't? They crash on large operations, staff can't use the system for bulk work, and they need emergency consulting to rebuild failed data. That's $40,000-$100,000 in lost productivity and data recovery costs.
Part 1: The Batch Processing Mindset
Wrong Approach (Causes Crashes)
# ❌ DON'T DO THIS
orders = self.env['sale.order'].search([]) # Load ALL 50,000
for order in orders:
order.process_order() # Memory = 250GB
# Result: CRASH
Right Approach (Safe)
# ✅ DO THIS
batch_size = 1000
offset = 0
while True:
orders = self.env['sale.order'].search(
[],
offset=offset,
limit=batch_size
)
if not orders:
break
for order in orders:
order.process_order()
# Memory released
offset += batch_size
# Result: SUCCESS
Key Principles
✓ Process small chunks (1,000-5,000 records)
✓ Release memory after each batch
✓ Show progress to user
✓ Handle failures gracefully
✓ Use transactions properly
Part 2: Chunking Strategies
Calculate Optimal Batch Size
# Formula: batch_size = 5,000 / (record_complexity)
# record_complexity = number of related records per order
# Simple record (order with 5 lines):
# Complexity = 5
# Batch size = 5,000 / 5 = 1,000 records per batch
# Complex record (order with 50 lines, 30 taxes, 10 documents):
# Complexity = 90
# Batch size = 5,000 / 90 ≈ 50 records per batch
Standard Batch Sizes
| Operation Type | Record Complexity | Batch Size |
|---|---|---|
| Simple imports (CSV) | Low | 1,000-2,000 |
| Orders with lines | Medium | 500-1,000 |
| Complex records (multiple relations) | High | 50-500 |
| Bulk updates/writes | Low | 1,000-2,000 |
Part 3: Real D2C Examples
Example 1: Safe Batch Import
Scenario: Import 100,000 orders from CSV.
WRONG (Crashes)
# ❌ WRONG
def import_orders(self):
"""Crashes on large imports."""
data = read_csv_file('/tmp/orders.csv') # Load all 100K
for row in data:
self.env['sale.order'].create({
'partner_id': row['customer_id'],
'amount_total': row['amount'],
...
})
return "Imported %d orders" % len(data)
RIGHT (Safe)
from itertools import islice
def import_orders(self):
"""Safely imports large CSV."""
batch_size = 1000
imported = 0
failed = 0
try:
data = read_csv_file('/tmp/orders.csv')
# Process in batches
for batch in self._batch_generator(data, batch_size):
try:
# Create batch
vals_list = [
{
'partner_id': row['customer_id'],
'amount_total': float(row['amount']),
'order_date': row['date'],
}
for row in batch
]
# Batch create (more efficient than loop)
self.env['sale.order'].create(vals_list)
imported += len(vals_list)
except Exception as e:
# Log batch error, continue
failed += len(batch)
self.env.cr.rollback()
_logger.error(f"Batch failed: {e}")
continue
return f"Imported: {imported}, Failed: {failed}"
finally:
# Always clean up
self.env.cr.commit()
def _batch_generator(self, iterable, batch_size):
"""Yield successive batches from iterable."""
iterator = iter(iterable)
while True:
batch = list(islice(iterator, batch_size))
if not batch:
break
yield batch
Example 2: Batch Report Generation
Scenario: Generate report for 500,000 order lines.
WRONG (Timeout)
# ❌ WRONG - Takes 10 minutes, times out
def generate_sales_report(self):
"""Generates report for all orders."""
# Load all 500K lines at once
lines = self.env['sale.order.line'].search([
('order_id.state', '=', 'done'),
])
report_data = []
for line in lines:
report_data.append({
'product': line.product_id.name,
'qty': line.product_qty,
'revenue': line.price_total,
})
return report_data
RIGHT (Fast)
# ✅ RIGHT - Uses aggregation, completes in 10 seconds
def generate_sales_report(self):
"""Generates report efficiently."""
# Use read_group instead of loading all records
data = self.env['sale.order.line'].read_group(
domain=[('order_id.state', '=', 'done')],
fields=[
'product_id',
'product_qty:sum',
'price_total:sum'
],
groupby=['product_id']
)
# Transform to report format
report_data = [
{
'product': d['product_id'][1],
'qty': d['product_qty'],
'revenue': d['price_total'],
}
for d in data
]
return report_data
# Result: 10 seconds, minimal memory
Example 3: Background Batch Processing with Queue Jobs
Scenario: Mass email campaign to 50,000 customers (can't do synchronously).
Setup queue_job Module
pip install odoo-bin-addons-queue-job
Use queue_job in Module
from odoo_bin_addons import queue_job
class MailCampaign(models.Model):
_name = 'mail.campaign'
def action_send_emails(self):
"""Send emails to all subscribers (using background jobs)."""
subscribers = self.env['res.partner'].search([
('email', '!=', False),
('opt_in', '=', True),
])
# Enqueue job for each batch
batch_size = 500
for i in range(0, len(subscribers), batch_size):
batch = subscribers[i:i+batch_size]
# Send as background job
self.with_delay(priority=10).send_batch_emails(batch)
return "Enqueued %d emails for sending" % len(subscribers)
@job
def send_batch_emails(self, subscribers):
"""Send emails to a batch of subscribers."""
template = self.env.ref('mail.email_template_campaign')
for subscriber in subscribers:
try:
# Send email
template.send_mail(
subscriber.id,
force_send=True
)
except Exception as e:
_logger.error(f"Failed to email {subscriber.email}: {e}")
# Continue with next subscriber
continue
Result
- ✓ User clicks "Send"
- ✓ Jobs enqueued immediately (no waiting)
- ✓ Background worker processes 500 emails at a time
- ✓ If one email fails, others continue
- ✓ User can continue using Odoo
Part 4: Memory Management
Clear Memory Between Batches
def process_large_dataset(self):
"""Process with explicit memory management."""
batch_size = 1000
offset = 0
while True:
# Fetch batch
records = self.env['sale.order'].search(
[],
offset=offset,
limit=batch_size
)
if not records:
break
# Process batch
for record in records:
record.action_process()
# CRITICAL: Clear caches to free memory
self.env.clear()
# Clean up local variables
del records
offset += batch_size
Monitor Memory Usage
import psutil
import os
def process_with_monitoring(self):
"""Process batch with memory alerts."""
process = psutil.Process(os.getpid())
max_memory = 1024 * 1024 * 1024 # 1GB alert threshold
batch_size = 1000
offset = 0
while True:
records = self.env['sale.order'].search(
[],
offset=offset,
limit=batch_size
)
if not records:
break
# Check memory
memory_used = process.memory_info().rss
if memory_used > max_memory:
_logger.warning(f"High memory: {memory_used / 1024 / 1024}MB")
# Process
for record in records:
record.action_process()
self.env.clear()
offset += batch_size
Action Items: Implement Safe Batch Processing
For Imports (CSV, Excel, etc.)
❏ Use batch_size = 1000
❏ Wrap each batch in try/except
❏ Clear cache after each batch
❏ Log failures without stopping
For Updates/Writes
❏ Use batch_size = 1000-2000
❏ Use .write() on entire batch (not loop)
❏ Commit after each batch
❏ Show progress bar to user
For Heavy Operations
❏ Use queue_job for background processing
❏ Enqueue jobs instead of processing synchronously
❏ Allow user to continue using Odoo
❏ Retry failed jobs automatically
For Reports/Aggregations
❏ Use read_group() instead of loading all records
❏ Use search_read() for specific fields only
❏ Use SQL directly for complex aggregations
❏ Avoid loops over large resultsets
Frequently Asked Questions
What is the optimal batch size for processing large datasets?
Optimal batch size depends on record complexity. Formula: batch_size = 5,000 / (number of related records). Simple records (e.g., basic products, partners) = 1,000-2,000 per batch. Medium complexity (e.g., orders with 5-10 lines) = 500-1,000 per batch. High complexity (e.g., orders with 50+ lines, multiple taxes, attachments) = 50-500 per batch. For bulk updates/writes, use 1,000-2,000. For CSV imports, start with 1,000 and adjust based on memory usage. Always clear memory after each batch with self.env.clear().
How do I prevent memory crashes when importing large CSV files?
Use a batch generator to process CSV in chunks, not all at once. Implementation: (1) Create _batch_generator() using itertools.islice to yield chunks of 1,000 rows. (2) For each batch, build vals_list and use create(vals_list) for batch insert. (3) Wrap each batch in try/except to handle errors without stopping entire import. (4) Call self.env.cr.rollback() on batch failure, log error, continue to next batch. (5) Call self.env.clear() after each batch to free memory. Result: Import 100K orders in 2-3 minutes using only 5-10MB RAM, not 250GB. Track imported and failed counts for user feedback.
When should I use queue_job for batch processing?
Use queue_job for operations that: (1) Take more than 10 seconds (user can't wait), (2) Process 10K+ records, (3) Send external API calls (email, webhooks), (4) Run scheduled batch operations. Example scenarios: Mass email campaigns (50K customers), bulk product sync with external systems, nightly report generation, large invoice batch creation. Benefits: User clicks action, jobs enqueue instantly, user continues working, background worker processes 500-1,000 records per job, failed jobs retry automatically. Setup: Install queue_job module, decorate method with @job, call with self.with_delay(priority=10).method_name(batch). Never use queue_job for operations under 5 seconds—overhead not worth it.
How do I optimize report generation for 500K+ records?
Never loop over large datasets. Use read_group() for aggregations—it runs SQL GROUP BY, returns summarized data. Example: Instead of search() loading 500K order lines then looping to sum quantities, use read_group(domain=[...], fields=['product_id', 'product_qty:sum', 'price_total:sum'], groupby=['product_id']). This returns one row per product with totals, not 500K rows. Alternative: Use search_read() to fetch only needed fields, not entire recordset. For complex reports, write raw SQL with self.env.cr.execute() and fetchall(). Performance difference: Loop approach = 10 minutes + timeout. read_group() approach = 5-10 seconds. Reduces memory from 25GB to 5MB.
Free Batch Processing Audit
Stop crashing on large operations. We'll identify all batch operations in your system, calculate optimal batch sizes, implement safe chunking, set up queue jobs for heavy operations, and test with large datasets. Most D2C brands don't have safe batch processing. Adding it prevents $30,000-$80,000 in emergency recovery costs.
