DataFrame Backends

xbbg is Arrow-native internally. Endpoint helpers return a Narwhals DataFrame by default. When PyArrow is installed, that Narwhals frame is backed by a real pyarrow.Table to preserve the legacy Narwhals/PyArrow behavior; otherwise xbbg falls back through installed dataframe libraries and finally the minimal xbbg.ArrowTable native carrier from xbbg._core.

Use backend="native" / Backend.NATIVE when you want the raw xbbg.ArrowTable carrier, or backend="pyarrow" / Backend.PYARROW for an actual pyarrow.Table. If default Narwhals output has to fall back to the limited native plugin because no PyArrow/pandas/Polars backend is installed, xbbg emits a one-time RuntimeWarning with install guidance.

Supported Backends

Eager Backends

Eager backends return a fully materialized object immediately.

Backend	Output type	Best for
default / `narwhals`	Narwhals DataFrame	Backwards-compatible dataframe ergonomics over PyArrow when installed, with a warned native fallback for minimal installs
`native`	`xbbg.ArrowTable`	Explicit native carrier, PyCapsule interop, memory-conscious workflows
`pyarrow`	`pyarrow.Table`	Full PyArrow table functionality and Arrow ecosystem integrations
`pandas`	`pd.DataFrame`	Traditional workflows, ecosystem compatibility
`polars`	`pl.DataFrame`	High performance, large datasets
`modin`	Modin DataFrame	Pandas API with parallel execution
`cudf`	cuDF DataFrame	GPU-accelerated processing (requires NVIDIA)

Lazy Backends

Lazy backends defer execution. The query graph is built when you call xbbg functions and evaluated only when you explicitly trigger execution (e.g. .collect() for Polars, .execute() for DuckDB).

Backend	Output type	Best for
`polars_lazy`	`pl.LazyFrame`	Deferred execution, query optimization
`narwhals_lazy`	Narwhals LazyFrame	Library-agnostic lazy evaluation through xbbg’s native plugin
`duckdb`	DuckDB relation	SQL analytics, OLAP queries
`dask`	Dask DataFrame	Out-of-core and distributed computing
`ibis`	Ibis Table	Unified interface to many backends
`pyspark`	Spark DataFrame	Big data processing (requires Java)
`sqlframe`	SQLFrame DataFrame	SQL-first DataFrame operations

Selecting a Backend

Global default

Backend.NARWHALS is already the default. Set it explicitly only when you want to restore the default dataframe wrapper after using another backend.

import xbbg
from xbbg import Backend, blp

xbbg.set_backend(Backend.NARWHALS)

# Calls now return Narwhals DataFrame
df = blp.bdp('AAPL US Equity', 'PX_LAST')
print(type(df))

You can also pass a string:

xbbg.set_backend('narwhals')

Per-call override

Pass backend as a keyword argument to any data function. This overrides the global default for that call only.

from xbbg import blp

# Native carrier result for this call
table = blp.bdp('AAPL US Equity', 'PX_LAST', backend='native')

# Explicit conversion when a library object is required
df = blp.bdp('AAPL US Equity', 'PX_LAST', backend='pandas')

Checking Availability

Not all optional backends are installed in every environment. Use these utilities to inspect what is available before writing code that assumes a specific backend. The default narwhals backend and explicit native carrier are provided by xbbg and are always available.

from xbbg import get_available_backends, is_backend_available, print_backend_status

# Print installed backend names
print([backend.value for backend in get_available_backends()])
# ['native', 'narwhals', 'pyarrow', 'pandas', 'polars', ...]

# Check a specific backend
if is_backend_available('polars'):
    print("Polars is installed")

# Print a detailed status table for all backends
print_backend_status()

Backend Examples

from xbbg import blp, Backend

table = blp.bdh(
    'SPX Index',
    'PX_LAST',
    start_date='2024-01-01',
    end_date='2024-12-31',
    backend=Backend.NATIVE,
)
# Returns xbbg.ArrowTable
print(table.column_names)
print(table.to_pylist()[:1])

from xbbg import blp, Backend

table = blp.bdp('AAPL US Equity', 'PX_LAST', backend=Backend.PYARROW)
# Returns pyarrow.Table
print(type(table))

from xbbg import blp, Backend

df = blp.bdp('AAPL US Equity', 'PX_LAST', backend=Backend.PANDAS)
# Returns pd.DataFrame
print(type(df))  # <class 'pandas.core.frame.DataFrame'>

from xbbg import blp, Backend

df = blp.bdp('AAPL US Equity', 'PX_LAST', backend=Backend.POLARS)
# Returns pl.DataFrame
print(type(df))  # <class 'polars.dataframe.frame.DataFrame'>

from xbbg import blp, Backend

relation = blp.bdh(
    'SPX Index',
    'PX_LAST',
    start_date='2024-01-01',
    end_date='2024-12-31',
    backend=Backend.DUCKDB,
)
# Returns a DuckDB relation — not yet executed
result = relation.fetchdf()  # trigger execution, returns pd.DataFrame

Arrow PyCapsule Interop

xbbg.ArrowTable implements the Arrow PyCapsule stream protocol (__arrow_c_stream__) and schema protocol (__arrow_c_schema__). xbbg.ArrowRecordBatch implements the Arrow PyCapsule array protocol (__arrow_c_array__) for raw streaming batches. Arrow-aware libraries can consume these objects directly without xbbg exposing a separate public backend for that library’s Arrow table type.

from xbbg import blp

table = blp.bdp('AAPL US Equity', 'PX_LAST', backend='native')

import polars as pl
polars_df = pl.from_arrow(table)

import duckdb
relation = duckdb.from_arrow(table)

Prefer backend='native' when downstream code can consume xbbg’s native carrier or the Arrow PyCapsule protocol directly. Request backend='pyarrow', backend='pandas', backend='polars', backend='duckdb', or another explicit backend when you want xbbg to materialize that library’s object at the boundary.

Performance Considerations

Default / Narwhals is the compatibility path. It keeps dataframe-style behavior over xbbg’s native carrier without requiring pandas or PyArrow as hard dependencies. df.to_pandas() imports pandas only when you call it.

Native carrier (native) is the lowest-overhead explicit result. Use it when you will inspect columns, call to_pylist(), split into to_batches(), or hand data to an Arrow-aware library through the PyCapsule protocol.

PyArrow is the full Apache Arrow Python table backend. Use backend='pyarrow' when code expects pyarrow.Table methods or PyArrow ecosystem behavior.

Polars is the best choice for pure computation on large datasets. Its columnar engine and lazy execution model handle datasets that would be slow or impractical in pandas. The polars_lazy backend lets you chain additional query steps before triggering evaluation.

DuckDB is useful when your next step is SQL analytics. Use backend='duckdb' for a relation, or pass the native carrier to DuckDB yourself when you want to control connection scope.

pandas remains the widest-compatibility option. Use it when integrating with libraries that only accept pd.DataFrame, or when working with existing pandas-based pipelines.

Lazy backends (DuckDB, Polars lazy, Dask, etc.) are useful when you want to compose queries across multiple xbbg calls before materializing any data, or when the result set is too large to hold in memory.

Output Formats — control the shape of returned data (LONG, LONG_TYPED, etc.)
API Reference — full function documentation with all parameters