Modern data systems often need to integrate several data technologies that serve different purposes. It quickly becomes non-trivial to integrate and maintain these technologies – indeed, that’s the essence of the data engineering discipline.
Beneath aims to eliminate data engineering overhead for the majority of production data science projects. So in order to provide a seamless experience, Beneath bundles three of the most common kinds of data technologies and automatically integrates them.
The three kinds of data technologies Beneath bundles are:
When you write data to Beneath, it consistently becomes available in an instance of each of these three systems.
By automatically configuring and integrating these different technologies, Beneath allows you to get started working on your data sourcing and transformations right away. Not only do you save months of wiring data technologies, you also get a stable data system that’s continuously updated with new best practices.
To illustrate how these systems work in tandem, imagine you’re building a weather forecasting website. Every time you get new weather data, you write it to Beneath and it becomes available in every system. The streaming log instantly pushes the data to your weather prediction model, which uses it to compute an updated forecast that it writes back into Beneath. Every time someone visits your website, you serve them the most recent forecast from the operational data index. Once a day, you re-train your weather prediction model with a complex query that runs in the data warehouse.
Under the hood, the cloud version of Beneath uses Google Pub/Sub for log streaming, Google BigQuery as its data warehouse and Google Bigtable as an operational data index. If you self-host Beneath, we provide drivers for a variety of other technologies. While the choice of underlying technologies have certain implications, Beneath generally abstracts away many of the differences.
In addition to the three categories mentioned above, there are some rarer categories that are also worth mentioning. They include stream processing engines (for running your data transformation code), graph databases (for querying networks of data), and full-text search systems (for advanced search). Beneath doesn’t currently bundle them, but we’re working on changing that!