Binary Data (Blobs)¶
Blobs provide efficient binary data storage with typed layouts and zero-copy NumPy integration.
When to use: Use blobs for binary data like images, meshes, and raw buffers.
BlobArray for typed arrays, BlobPack for structured multi-region data.
Blob vs BlobId¶
Viper offers two DSM types for binary data:
Type |
Use Case |
Storage |
|---|---|---|
|
Small data (thumbnails, icons) |
Inline in document |
|
Large data (textures, meshes) |
Database blob API |
blob (Inline)¶
A blob field stores binary data directly in the document. No special API needed.
blob_id (Reference)¶
A blob_id is a SHA-1 hash referencing a blob managed by the Database blob API. This
requires the dedicated blob API:
# Store blob → get blob_id
>>> blob_id = db.create_blob(layout, content)
# Retrieve blob by blob_id
>>> content = db.blob(blob_id)
# List all blob_ids in database
>>> db.blob_ids()
# Get blob metadata
>>> info = db.blob_info(blob_id)
>>> info.size()
1024
Content-addressable: The blob_id is computed from layout + content (SHA-1).
Identical content always produces the same blob_id.
Constraint: A document referencing a blob_id cannot be committed unless the blob
exists in the database. Always call create_blob() before using the
blob_id in a document.
ValueBlob¶
A ValueBlob holds inline binary data:
>>> from dsviper import *
# Create blob from bytes
>>> data = bytes([1, 2, 3, 4, 5])
>>> blob = ValueBlob(data)
# Access bytes
>>> blob.bytes()
b'\x01\x02\x03\x04\x05'
>>> len(blob)
5
ValueBlobId¶
A ValueBlobId references external binary data. The ID is computed from the blob’s layout
and content:
# Create blob_id from layout and content
>>> layout = BlobLayout()
>>> content = ValueBlob(bytes([1, 2, 3, 4]))
>>> blob_id = ValueBlobId(layout, content)
>>> blob_id
'6b8f3ca756046be29244d9bdb6b5ca5c00468ad5'
# Parse blob_id from string
>>> blob_id = ValueBlobId.try_parse("6b8f3ca756046be29244d9bdb6b5ca5c00468ad5")
BlobLayout - Metadata Everywhere¶
A BlobLayout describes how to interpret blob bytes. This is the Metadata Everywhere
principle applied to binary data: the layout is metadata that gives meaning to raw bytes.
>>> from dsviper import *
# Default layout: array of unsigned bytes
>>> BlobLayout()
'uchar-1'
# 3D positions: array of (float, float, float)
>>> BlobLayout('float', 3)
'float-3'
# Triangle indices: array of (uint, uint, uint)
>>> BlobLayout('uint', 3)
'uint-3'
# UV coordinates: array of (float, float)
>>> BlobLayout('float', 2)
'float-2'
The layout enables:
Type-safe interpretation of binary data
Cross-platform compatibility (endianness handled)
Validation at decode time
BlobView (Read-Only)¶
A BlobView interprets an existing blob with a given layout (read-only):
>>> from dsviper import *
# Existing blob from database
>>> blob = db.blob(blob_id)
# Interpret as vec3 positions
>>> view = BlobView(BlobLayout('float', 3), blob)
>>> view.count()
100
# Read elements
>>> view[0]
(1.0, 2.0, 3.0)
>>> view[99]
(10.0, 20.0, 30.0)
Use BlobView when you need to read blob data without copying.
BlobArray (Read-Write)¶
A BlobArray is a typed array backed by a blob:
# Create array of 100 vec3 positions
>>> layout = BlobLayout('float', 3)
>>> array = BlobArray(layout, 100)
# Write elements as tuples
>>> array[0] = (1.0, 2.0, 3.0)
>>> array[1] = (4.0, 5.0, 6.0)
# Read back
>>> array[0]
(1.0, 2.0, 3.0)
# Properties
>>> array.count() # 100 elements
>>> array.data_count() # 300 floats (100 × 3)
>>> array.byte_count() # 1200 bytes (300 × 4)
BlobPack - Structured Binary Data¶
A BlobPack groups multiple named regions with different layouts into a single blob. This
is ideal for complex structures like 3D meshes.
Example: 3D Mesh Storage¶
A mesh has positions, normals, UVs, and triangle indices—each with a different layout:
>>> from dsviper import *
# Define the mesh structure
>>> descriptor = BlobPackDescriptor()
>>> descriptor.add_region('positions', BlobLayout('float', 3), 4) # 4 vec3
>>> descriptor.add_region('normals', BlobLayout('float', 3), 4) # 4 vec3
>>> descriptor.add_region('uvs', BlobLayout('float', 2), 4) # 4 vec2
>>> descriptor.add_region('indices', BlobLayout('uint', 3), 2) # 2 triangles
# Create the pack
>>> mesh = BlobPack(descriptor)
>>> len(mesh)
4
Fill the Mesh Data¶
# Quad vertices (4 corners)
>>> mesh['positions'][0] = (-1.0, -1.0, 0.0)
>>> mesh['positions'][1] = (1.0, -1.0, 0.0)
>>> mesh['positions'][2] = (1.0, 1.0, 0.0)
>>> mesh['positions'][3] = (-1.0, 1.0, 0.0)
# Normals (all facing +Z)
>>> for i in range(4):
... mesh['normals'][i] = (0.0, 0.0, 1.0)
# UVs
>>> mesh['uvs'][0] = (0.0, 0.0)
>>> mesh['uvs'][1] = (1.0, 0.0)
>>> mesh['uvs'][2] = (1.0, 1.0)
>>> mesh['uvs'][3] = (0.0, 1.0)
# Two triangles forming the quad
>>> mesh['indices'][0] = (0, 1, 2)
>>> mesh['indices'][1] = (0, 2, 3)
Serialize and Restore¶
# Serialize to a single blob
>>> blob = mesh.blob()
# Store in database
>>> blob_id = db.create_blob(BlobLayout(), blob)
# Later: restore from blob
>>> restored = BlobPack.from_blob(blob)
>>> restored['positions'][0]
(-1.0, -1.0, 0.0)
>>> restored['indices'][1]
(0, 2, 3)
Region Access¶
# Check if region exists
>>> 'positions' in mesh
True
>>> 'colors' in mesh
False
# Get region info
>>> vertices = mesh['positions']
>>> vertices.name()
'positions'
>>> vertices.count()
4
>>> vertices.blob_layout()
'float-3'
Why BlobPack?¶
Benefit |
Description |
|---|---|
Single blob |
All mesh data in one blob_id |
Typed regions |
Each region has its own layout |
Self-describing |
Layout metadata embedded in header |
Efficient |
Direct memory mapping, no parsing |
NumPy Integration¶
BlobArray implements the Python Buffer Protocol, enabling zero-copy interoperability
with NumPy and other array libraries.
Zero-Copy View¶
>>> import numpy as np
>>> from dsviper import *
# Create a BlobArray of vec3 positions
>>> layout = BlobLayout('float', 3)
>>> positions = BlobArray(layout, 100)
# Get NumPy view (no copy!)
>>> np_view = np.array(positions, copy=False)
>>> np_view.shape
(100, 3)
>>> np_view.dtype
dtype('float32')
Bidirectional Modifications¶
Changes through NumPy affect the original BlobArray:
# Modify via NumPy
>>> np_view[0] = [10.0, 20.0, 30.0]
# Original is updated
>>> positions[0]
(10.0, 20.0, 30.0)
Direct Memory Access¶
# memoryview for low-level access
>>> mv = memoryview(positions)
>>> mv.nbytes
1200 # 100 × 3 × 4 bytes
# Bulk copy from bytes
>>> source = b'\x00\x00\x80\x3f...' # binary data
>>> positions.copy(source)
BlobPack Regions¶
BlobPackRegion also supports the Buffer Protocol:
# Access mesh regions as NumPy arrays
>>> positions_np = np.array(mesh['positions'], copy=False)
>>> normals_np = np.array(mesh['normals'], copy=False)
>>> uvs_np = np.array(mesh['uvs'], copy=False)
>>> positions_np.shape
(4, 3)
>>> uvs_np.shape
(4, 2)
# Transform all positions at once
>>> positions_np *= 2.0 # Scale mesh
>>> mesh['positions'][0]
(-2.0, -2.0, 0.0)
Why Zero-Copy Matters¶
Scenario |
Without Zero-Copy |
With Zero-Copy |
|---|---|---|
1M vertices |
Copy 12MB |
Share pointer |
GPU upload |
Python → copy → C++ |
Direct access |
Scientific compute |
Data duplication |
In-place ops |
Blob in Attachments¶
Blobs can be used in attachments:
// Small data inline
struct Thumbnail {
uint16 width;
uint16 height;
blob data;
// Inline binary
};
// Large data by reference
struct Texture {
uint16 width;
uint16 height;
blob_id pixels;
// External reference
};
Storing Blobs in Database¶
Inline Blobs¶
Inline blobs are stored directly in the document:
>>> thumb = Value.create(t_thumbnail)
>>> thumb.width = 64
>>> thumb.height = 64
>>> thumb.data = ValueBlob(image_bytes)
>>> mutating.set(attachment, key, thumb)
Referenced Blobs (blob_id)¶
Use the database blob API to store and retrieve:
# Store blob via Database blob API
>>> layout = BlobLayout('float', 3)
>>> content = ValueBlob(mesh_bytes)
>>> blob_id = db.create_blob(layout, content)
# Use blob_id in document
>>> texture = Value.create(t_texture)
>>> texture.width = 1024
>>> texture.height = 1024
>>> texture.pixels = blob_id
>>> mutating.set(attachment, key, texture)
Retrieving Blobs¶
# Get blob by ID
>>> content = db.blob(blob_id)
>>> content.bytes()
b'...'
# Partial read (for large blobs)
>>> chunk = db.read_blob(blob_id, size=1024, offset=0)
BlobStream (Large Blobs)¶
For very large blobs, use streaming to avoid loading everything in memory.
Required for blobs > 2GB: The standard create_blob() API has a 2GB size limit. Use
BlobStream for larger data:
# Create a stream for a 100MB blob
>>> layout = BlobLayout('uchar', 1)
>>> stream = db.blob_stream_create(layout, size=100_000_000)
# Write in chunks
>>> for chunk in read_file_in_chunks("large_file.bin"):
...
db.blob_stream_append(stream, ValueBlob(chunk))
# Close stream → get blob_id
>>> blob_id = db.blob_stream_close(stream)
This is essential for:
3D meshes with millions of vertices
Video/audio data
When to Use Each Type¶
Scenario |
Recommendation |
|---|---|
Thumbnails (< 64KB) |
Use |
Textures (> 1MB) |
Use |
Mesh geometry |
Use |
Icons, small images |
Use |
Audio/video |
Use |
What’s Next¶
Serialization - JSON and binary encoding