We’re excited to announce Beneath! Beneath is a full data system out-of-the-box. In this post, we’ll tell you what it does and why we’re building it.
Remember that time when your boss was really excited about a cool graph that you created in a couple hours? And then he turned very unimpressed when, one month later, you were still struggling to create an API for it?
Or you know how your data - spanning Postgres, BigQuery, some SaaS apps, an API server, and maybe a message queue - are connected by Python data pipelines that are like super robust and bulletproof and resilient to change?
You can do some neat data science with Python and a dataset in CSV or Excel format. But nothing beats deploying a live, data-centred application with real-time insights and its own API. Where the charts are still up-to-date tomorrow. Where you can apply your models to your users' data and instantly give them insights.
As data scientists and data engineers, we want to spend our time on the unique things – like developing models and creating an intuitive user experience. From day one. Not “once the systems are in place.”
And here’s the real downer: the data system is never quite in place. The number of deployed resources in AWS grows higher. Your Python pipeline spaghetti grows more and more unappetizing. It’s hard to monitor and you lose the sense of control. The system feels brittle.
We want a data system that’s up and running right away. We want to see our data move in real-time. We want to easily add processing logic to derive new data. We want to have confidence in our data quality. It should be just as easy to create an API for our data as it is to create that cool graph. It should be trivial to share our data and analytics with others. We shouldn’t have to worry about our system breaking. The time-to-production for new analytics should be much shorter.
Our ambition is to make Beneath the stable and transparent data system in-a-box. It currently packs a log streaming system, a data warehouse and a low-latency indexed data store. When you write data to Beneath, it becomes available everywhere.
You get a full overview through one user interface, the Beneath console. It automatically generates REST and Websocket APIs for your data. You can read data into notebooks with the Python library or directly into your frontend with the JS library. It integrates into most business intelligence software through SQL.
Ultimately, we aim to bundle all the integrations, helper libraries, and other tooling necessary for it to serve as the entire backend of a real-time data-centred application.
In Beneath, it’s a flip of a switch to share streams with the public or select parties. And shared data isn’t just exposed through one inadequate API. When you open up a data stream, everyone gets all the same functionality that you enjoy. That means log replay and streaming, data warehouse queries, low-latency index lookups, access in the Beneath console, etc.
And since all users on Beneath are accountable for their own data consumption (and get a free monthly quota), you can freely share your streams without worrying about running up a planet-scale bill.
We love this feature because it allows everyone to work off of one copy of the data, one source-of-truth. Goodbye to integrating with an API just to copy data into your own systems for processing. Hello to transparently seeing the insights others derive from your data.
Currently, Beneath handles data storage, streaming and access. It handles the state of your data system. But to facilitate transformations and analytics, you still have to run your own compute. Our documentation has tutorials to help you, but this is still not the user experience we really want.
Ultimately, Beneath will host transformations and analytics models on behalf of users. So if you’re an e-commerce business, and want to generate product recommendations, you could write customer actions into a stream, deploy your recommendation model to Beneath, and load recommendations, which materialize and update in real-time, into your store. All without wrestling with deployments.
Hosting transformations and models automatically on Beneath is our top priority.
We’re two data scientists who have grown frustrated with wiring data systems. We have worked in data roles in both large enterprises and startups, which has convinced us the world needs a better way to create stable data systems.
If you’re excited about Beneath, please get in touch! We’d love to hear about your experience or any features you’re interested in. The entire codebase for Beneath is also source available if you’re interested in how it works under the hood.