Elasticsearch connector

Push your operational data into Elasticsearch and let search, observability and semantic retrieval run on the same index.

Data Panda lifts records from your CRM, ERP, ecommerce, product and log sources into Elasticsearch on a known schedule. Once it lives in one cluster, your search experiences, your Kibana dashboards and your vector queries all read indices that match the rest of the business instead of each app shipping its own crawler.

Data Panda Reporting Automation AI Apps
Elasticsearch logo
About Elasticsearch

The distributed search and analytics engine on top of Apache Lucene.

Elasticsearch was started in 2010 by Shay Banon, built on the Apache Lucene search library and exposed through a JSON over HTTP API. The company behind it, Elastic NV, listed on the NYSE in October 2018 under the ticker $ESTC and is headquartered in Mountain View. The engine is the core of the Elastic Stack, where Logstash and Beats handle ingest and Kibana handles the UI, the combination most people still call ELK.

Inside the cluster, data lives in indices, which are split into shards and copied into replicas across nodes. Documents are JSON, mappings define how fields are analysed, and the Query DSL covers everything from a basic match to fuzzy, geo and nested queries. Aggregations turn the same indices into an analytics surface for log, metric and event data, which is why the ELK stack became the default observability pattern for a generation of engineering teams. Since 8.x, dense-vector fields and a kNN search API made Elasticsearch a credible store for semantic and hybrid search next to BM25. The licensing story to be aware of: in 2021 Elastic moved from Apache 2.0 to a dual SSPL plus Elastic License v2, which prompted AWS to fork the engine as OpenSearch, and in 2024 Elastic added AGPLv3 as a third option to make the source open-source again under OSI terms. We treat the cluster as a destination, land the data on a cadence the cluster can absorb, and shape mappings and shards to the workload instead of the default.

What your Elasticsearch data is for

What you get once Elasticsearch is connected.

Kibana on tied data

Kibana dashboards read indices that match the rest of the business, not a side copy of it.

  • Log, metric and event indices share keys with operational records
  • One mapping per business object, not one per ingestion script
  • Saved searches and Lens visualisations rest on stable field names

Indexing on a known cadence

Operational records land in Elasticsearch on a schedule the cluster can absorb without queue backlog.

  • Bulk API loads sized so refresh interval and merges stay healthy
  • Index lifecycle policies move hot data to warm and cold tiers on time
  • Failed bulk batches surface upstream of the search experience going live

Vector and semantic search next to BM25

Dense vectors live next to text fields so hybrid retrieval runs on the same indices a search box queries.

  • kNN search on dense_vector fields paired with BM25 for hybrid ranking
  • Embeddings refreshed in step with the source records that produced them
  • RAG pipelines read Elasticsearch as the retrieval layer over governed text

Search inside the products you ship

Internal apps, customer portals and back-office tools query one Elasticsearch cluster instead of three.

  • Product, customer and content search served from the same cluster
  • Role-based access at index level keeps tenants and teams apart
  • App search latency stays steady as record volume grows
Use cases

Use cases we deliver with Elasticsearch data.

A list of concrete reports, automations and AI features we have built on Elasticsearch data. Pick the one that matches your situation.

Product search in the webshopCatalogue, variant and stock indices powering site search and filters.
Customer 360 searchOne indexed record across CRM, billing, support and product usage.
ELK observabilityLogs, metrics and traces in Elasticsearch with Kibana on top.
SIEM and security analyticsThreat detection on event indices with retention tuned per source.
Hybrid search for RAGDense-vector and BM25 in one index for retrieval-augmented generation.
Internal knowledge searchWiki, ticket and document search behind a single search bar.
Index lifecycle and tieringHot, warm and cold tiers sized to retention and query cost.
Geo and location queriesDistance and polygon queries on store, asset or fleet data.
Off the OLTP search loadMove LIKE-heavy queries off the transactional database into a search index.
Cluster cost containmentRight-sized shards and ILM policies to flatten heap and storage spend.
Self-hosted or Elastic CloudSame engine, deployed where data residency and ops fit best.
Real business questions

Answers you will finally get.

Why is our Elasticsearch cluster slow even though we keep adding nodes?

Almost always shard sizing or mapping. Default mappings often analyse fields nobody searches and create thousands of tiny shards that pin heap. Reviewing mappings against actual queries, consolidating shards toward the recommended size and putting an ILM policy on time-series indices usually gives more headroom than a bigger node.

Should we run Elasticsearch ourselves or move to Elastic Cloud or OpenSearch?

Self-hosted suits teams with platform engineering capacity and a residency case for keeping the cluster on their own infrastructure. Elastic Cloud takes the operations cost away and tracks new releases first. OpenSearch makes sense when AWS lock-in and the AGPL/SSPL conversation matters more than the latest Elastic features. Most BE/NL clients pick on residency, ops appetite and the AI feature roadmap they want to follow.

Can we use Elasticsearch as the retrieval layer for our RAG or AI search use case?

Yes, and the hybrid story is one of its stronger arguments. Dense-vector fields with kNN search live next to BM25 in the same index, so RAG and AI search return results that already pass the same filters as the keyword query. We make sure embeddings refresh in step with the source records, so the model is reading the same record version your business is.

Value for everyone in the organisation

Where each function gets value.

For finance leaders

Finance gets a stable view of cluster spend per index family, so observability, app search and AI retrieval each carry their own cost line. Tier transitions and shard cleanup show up in the bill instead of in surprise overage.

For sales leaders

Sales and customer-facing teams hit one search bar that returns product, customer and ticket records on the same ranking. Catalogue search on the webshop, internal lookup in the back-office and the support agent view all read the same indices.

For operations

Platform and SRE leads track shard count, heap pressure, refresh interval and ILM transitions in one Kibana view. Elasticsearch stops being the cluster nobody understands and becomes a sized, tiered system with a known cost per index.

Data model

Tables we make available.

These are the 1 tables we currently pull from Elasticsearch into your warehouse. Query them directly in SQL, join them to the rest of your stack, or build reports on top.

  • Search List

Missing a table you need? We can extend the sync. Tell us what is missing and we will build it for you.

Your existing tools

Your data lands in a warehouse. Your BI tools read from it.

You keep the reporting tool you already have. We connect it to the warehouse where your Elasticsearch data lives.

Power BI logo
Power BI Microsoft
Microsoft Fabric logo
Fabric Microsoft
Snowflake logo
Snowflake Data warehouse
Google BigQuery logo
BigQuery Google
Tableau logo
Tableau Visualisation
Microsoft Excel logo
Excel Sheets & pivots
Three steps

From Elasticsearch to answers in three steps.

01

Connect securely

OAuth authentication. Read-only by default. We sign a DPA and your admin keeps the keys.

02

Land in your warehouse

Data flows into your warehouse on your schedule. Near real time or nightly, your call. You own the data.

03

Reporting, automation, AI

We build the first dashboard, workflow or AI feature with you, then hand over the keys. Or we stay on for ongoing delivery.

Two ways to work with us

Pick the track that fits how you work.

Track 01

Self-serve

We set up the foundation. Your team builds on top.

  • Elasticsearch connector configured and running
  • Warehouse set up in your cloud account
  • Clean access for your Power BI, Fabric or Tableau team
  • Documentation on what's in the data model
  • Sync monitoring so you're warned before reports break

Best fit Teams that already have a BI analyst or data engineer and want to own the build.

Track 02

Done for you

We build the whole thing, end to end.

  • Everything in Self-serve
  • Dashboards built to the questions your team actually asks
  • Automations between your systems
  • AI workflows scoped to real tasks your team runs
  • Custom apps where a dashboard does not cut it
  • Ongoing delivery at a pace that fits your team

Best fit Teams without in-house BI or dev capacity. You tell us what you need and we deliver it.

Before you book

Frequently asked questions.

Who owns the data?

You do. It lands in your warehouse, on your cloud account. We don't resell or aggregate it. If you stop working with us, the warehouse stays yours and keeps running.

How fresh is the data?

Near real time for most operational systems. For heavier sources we schedule hourly or nightly. You pick based on what the reports need.

Do I need a warehouse already?

No. If you don't have one, we help you pick one and set it up as part of the first delivery. Common starting points are Snowflake, Microsoft Fabric, or a small Postgres start.

How do you keep our Elasticsearch heap and shard count from running away?

We size shards toward the practical 10 to 50 GB range Elastic itself recommends, consolidate the small ones, and put an ILM policy on time-series indices so old data moves to warm or cold tiers on time. Mappings get reviewed against the queries that really run, so analysers and stored fields are not paid for nothing. Most clusters that have been growing for a year or two see heap pressure drop in the first cleanup cycle without a node added.

Should we stay on Elasticsearch or move to OpenSearch?

Both are credible. Elasticsearch under SSPL or AGPLv3 keeps you on the engine Elastic ships first, with the latest vector and AI search features. OpenSearch under Apache 2.0 fits teams who want the AWS-managed path or who have a procurement reason to keep an OSI-style permissive licence. We have clients on each, and the migration in either direction is workable when you have shaped the indices and ingest properly to begin with.

GDPR-compliant
Data stays in the EU
You own the warehouse

A first deliverable live in four to six weeks.

We review your Elasticsearch setup and the systems around it. Together we pick the first thing worth building.