December 12, 2023
What is the secret sauce to create agile data apps, AI apps or any kind of data products, then? To dig into it we’d better review Plotly capabilities to grasp all the key concepts:
First, iterations in the data structure must not be breaking changes. Agile data development has to allow us to introduce new features (columns) to our dataset, and our back end and front end have to be ready for that without requiring extra work.
We have to be able to change the data. If x-axis and y-axis are abstract concepts since Descartes (c. XVII), why does our engineering still need to name them as ‘billing’ and ‘date’? See how Plotly, just picking the column variables to plot it, does the whole job.
And making changes is smooth, is trivial.
We do not want to go to our Back-end and ask, “Hey, we thought that we’d better show a comparison in this dataset, therefore we want to add two arrays, could you create another column in that table? By the way, we will need to retrieve both from the API endpoint as well”. Pandas dataframe doesn’t require an IT department to add or remove columns, why do we need it when we go to production systems for data apps?
A back-end architecture that doesn’t need to be touched every time there is a column renaming, a new column or one is removed is a cornerstone for creating agile data products.
Why can’t current IT architectures sustain changing a chart fast? There is no data storage in the market that allows this flexibility, there is not any front-end system that allows reading a liquid back end and showing whatever it contains.
A data app has to be capable of switching components’ types such as a chart or indicator. This line chart? Let’s change it to a barchart. In Plotly it is about changing a single word in the equation: change “line” for “bar” and boom! Magic! Beautiful!
We do not want to go to the Front-end and say “Hey, sorry, we talked to our client and we believe now this is better understood as a bar chart”. IT developers get angry and we get frustrated by asking for permits for every little step.
A product is about iterations. A good product is about fast iterations. This also applies to a data product, whether it is a data app or an AI app. If we cannot create new pages, remove others, rename, change the order of the charts, we will be slaves of our project management.
While good project management is key, data apps are about showing the right data in the right place in the right order. And being in the market is the best way to reach such knowledge. When you are in the market you either fit fast or perish. Have you tried to recompose a whole PWA with a full stack team? It could take decades; the tech teams are pushed to create vast amounts of legacy that eventually become a leak of resources, motivation, money and talent.
In plotly, changing the order of a chart is about changing the row and column value, why can’t data apps work like that?
Why can’t the engineering be so fast to recompose and boost the product-market fit of data apps? The problem is clear, but no solution exists to date (except by Shimoku).
BI tools or Streamlit are quite limited when you think about the user experience. In fact, it is not possible to create a smooth page flow and a rich architecture of pages. A data app has to be a product like any other, it has to be capable of giving service to thousands of users and also to give a rich experience, meaning, links from page to page so that navigation is engaging. Have you tried to go from one page to another in Tableau or Qlik? It is messy and uncomfortable, you cannot sell a product like that.
Current technologies allow you to create a 90s style single page data app, but what is required are rich products with pages, subpages and buttons to jump to wherever you want in a single click (think of Google Analytics, Sentry or Datadog for instance). Filters are not enough, and those tools have less than powerful filters (this is not even the case for Streamlit)...
What we need is a framework to build data apps as fast as with Streamlit, that allows you to generate a rich variety of pages and subpages and easy ways to connect them creating a unified experience to engage your users. We need a data product to be as any other product that puts the user in the center to make their life easier.
You need to create a good infrastructure to grow in number of users or pay for expensive services such as Dash Cloud or Tableau Server. In both cases it will take a large amount of time and effort so that you can handle a few thousands of users (I do not think you can have a few thousands of users in a BI tool even buying a whole datacenter).
Services that charge you by the server size or by the number of users are far from being the real solution we need. We need to use a technology that is independent of how many users we have. Limiting the users limits the potential of our data apps. Could you imagine Facebook limiting their service to 1000 accounts? Or Google not allowing more than 1000 searches every day worldwide? Why are data apps thought of as limited services for few?
Part of the answer lies in the fact that most data apps of the past were thought of for analysts. But this is not the truth anymore. Data literacy is more and more important for companies of any size and individuals and professionals are more used to making decisions based on KPIs or a prediction that supports a hypothesis or potential scenario. The emergence of the data mesh is the last proof of this.
Data apps & AI apps have to be open to any number of users, as the number of users that can read a chart and KPI and take action on them has increased in the last years to almost any office role.
Personal access control: each user having an email and password. It is common in the industry that half a dozen or more users share a single password, because (surprise!) the industry charges based on the number of users. This is a tax that is blocking the unstoppable emergence of data products. We have considered data products as a VIP service for capable users that have gone through courses on data analytics so that they can read the information into insights. But this era is over, nowadays to create insights from analytics is becoming a common skill in any office anywhere.
Hence, we need to open up these data apps so that anyone can access and boost their work performance thanks to them. To do so, it is key to guarantee a free independent access to each user. Few months ago Tableau unblocked this possibility: they are late, the market has been demanding this for years. For others, such as Plotly and Streamlit, one needs to create the access control infrastructure and this usually takes weeks of developers and a lot of money.
We are talking about products here. For data products to succeed, an independent login for every individual that uses a data app is as important as it is for Facebook.
We need an architecture such as the following:
In the end what we are proposing is a general data infrastructure that saves a lot of time and resources in code development to create these data domains and data products (data apps & AI apps) considering the data mesh principles:
· Access control. To any number of users.
· Security. Guarantee all the standards of security are fulfilled.
· Data availability. So that data can be extracted programmatically and linked to other domains or data products.
· Versioning. Each version of the data app (and the data that accompanies it) has to have its own version so that we can rollback easily.
· Discovering and exploring. Having a professional UX is key for any product nowadays.
We do believe this is the future of the data industry.
Download this post in PDF to have it whenever you want