# Data Pipeline & Market Scanning

{% hint style="info" %}
Operator and jurisdiction: BASIS is operated by BASIS DIGITAL INFRASTRUCTURE LTD, a Seychelles IBC (LEI: [254900IX2F2KCWNSSS64](https://lei.bloomberg.com/leis/view/254900IX2F2KCWNSSS64)).

Research Partner: Base58 Labs contributes execution research, systems modeling, and risk design.
{% endhint %}

In structural alpha capture, the first failure mode is accepting a price that is not truly executable.

The data pipeline converts raw venue signals into a validated signal stream for the execution precision layer. That stream feeds BHLE, the BASIS execution engine built for sub-50μs routing, 100K+ OPS throughput, and proprietary routing infrastructure.

{% hint style="warning" %}
A visible spread is not automatically tradable. A signal is only eligible if it passes price validation, venue health checks, cost modeling, and deterministic execution constraints.
{% endhint %}

## 1. Signal sources

{% tabs %}
{% tab title="Market data" %}
BASIS ingests market-state inputs such as:

* top-of-book quotes
* order book snapshots and deltas
* trade prints
* mark prices and index prices
* funding rates and open interest where relevant
  {% endtab %}

{% tab title="Operational data" %}
Operational signals are treated as first-class inputs:

* withdrawal status
* deposit status
* API latency and error rates
* throttling and rate-limit conditions
* maintenance notices
* settlement or transfer interruptions
  {% endtab %}

{% tab title="On-chain data" %}
Where strategy modules require it, BASIS also evaluates:

* gas conditions
* block congestion
* confirmation latency
* bridge or settlement state
* wallet and contract interaction health
  {% endtab %}
  {% endtabs %}

A venue registry defines which feeds are eligible and how much confidence each source receives.

## 2. Normalization

Each venue exposes data differently. To make signals comparable, BASIS transforms all inputs into a canonical internal format.

| Input difference           | Normalization action                 |
| -------------------------- | ------------------------------------ |
| Symbol naming              | Canonical symbol mapping             |
| Quote currency conventions | Unified quote handling               |
| Precision and tick size    | Scaled numeric normalization         |
| Timestamp format           | Clock alignment and drift monitoring |
| Depth representation       | Standardized depth ladder format     |
| API semantics              | Common event schema                  |

Example canonical event:

```json
{
  "venue": "exchange_a",
  "symbol": "BTC-USD",
  "timestamp_ns": 1731045600000000000,
  "best_bid": 68250.10,
  "best_ask": 68250.45,
  "bid_size": 1.42,
  "ask_size": 0.98,
  "sequence": 184220991,
  "health_score": 0.97
}
```

Normalization reduces semantic mismatch before any opportunity model is applied.

## 3. Cross-validation and outlier rejection

A single venue can publish stale, lagged, or erroneous prices. BASIS therefore applies multi-source validation before any signal reaches execution.

{% stepper %}
{% step %}
Collect comparable observations across eligible venues.
{% endstep %}

{% step %}
Estimate fair reference levels using robust statistics such as medians and trimmed means.
{% endstep %}

{% step %}
Reject observations outside dynamic deviation thresholds.
{% endstep %}

{% step %}
Require temporal consistency across successive updates.
{% endstep %}

{% step %}
Promote only validated signals to the execution queue.
{% endstep %}
{% endstepper %}

This process reduces the probability of trading on a ghost gap or stale book.

## 4. Venue health scoring

A large spread can indicate opportunity, but it can also indicate operational stress. BASIS scores venues continuously and uses those scores as part of the eligibility gate.

| Health input            | Why it matters                       |
| ----------------------- | ------------------------------------ |
| Withdrawal availability | Determines settlement realism        |
| Deposit availability    | Affects inventory mobility           |
| API latency             | Impacts execution precision          |
| Error rate              | Indicates feed stability             |
| Throttling conditions   | Limits order placement reliability   |
| Maintenance windows     | Can invalidate live pricing          |
| Sequence integrity      | Detects missing or corrupted updates |

Low health scores can down-rank or fully exclude a venue from signal generation.

## 5. Market scanning and opportunity detection

After normalization and validation, the signal engine scans for executable structural alpha, including:

* cross-venue price dislocations
* funding and basis differentials
* spot and derivative mispricings
* on-chain versus off-chain valuation gaps where relevant

Detection alone is not sufficient. Every candidate must also pass:

* depth sufficiency checks
* transfer and settlement feasibility checks
* fee and slippage modeling
* route construction checks
* state-machine risk controls

{% hint style="info" %}
Trust in the signal engine comes from deterministic execution rules, mathematical constraints, and explicit state transitions. A candidate either satisfies the full rule set or it does not enter execution.
{% endhint %}

## 6. Execution handoff

The validated signal stream is handed to the orchestration layer only when all required constraints are satisfied:

* data freshness is within tolerance
* venue health is above threshold
* executable depth is sufficient
* modeled edge remains positive after costs
* routing path is stable
* risk state permits action

This architecture is designed to prioritize determinism over headline spread size.

## 7. Why this matters

The data pipeline is the first control surface for execution quality. If inputs are inconsistent, stale, or operationally compromised, even fast infrastructure will route bad decisions quickly. BASIS therefore treats market scanning as a constrained systems problem, not a simple spread detector.

Next: read Execution Orchestration.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.basis.pro/technical-architecture/data-pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
