A teams’ view of data management maturity and challenges

A teams’ view of data management maturity and challenges

Listen to the podcast from a data management and engineering leader focused on enabling AI for enterprises who has worked at leading tech companies like Google:

Avishek Panigrahi

Shikhin Agarwal · Data Management Podcast - Feb 2023

1. Some data exists somewhere

  • Management reporting through spreadsheets
  • Manual data processing
  • Lack of trust in data; incomplete projects due to missing data

2. A few data sets in a data warehouse

  • Limited analytics and dashboards (e.g., sales funnel analytics)
  • A few employees can access data for special projects
  • No central data governance, quality, and compliance

3. Aggregation of data from multiple sources/tools

  • A few functional reports available; still looking at the rearview mirror
  • High cost of teams & tools for data, analytics, and visualization
  • Partial understanding of and agreement on data definitions & quality

4. Data analytics, visualization & intelligence

  • Pre-designed reports, analytics and visualization are available
  • A few predictive models exist (CX management, revops, etc.)
  • Data engineering in-place but data-driven culture not yet established

5. Data Management: Vision & goals

  • Data helps generate revenue
  • Positive ROI on data investments
  • Highly reliable predictive decisions, revops, personalization, AI/ML, etc.)
  • Data driven culture; employees focused on high value tasks

Full script of the podcast on data management

Hi everyone. This is Shikhin Agarwal here. I'm the founder and CEO of StatsLateral. I have Avishek Panigrahi with me today. Hi Avishek. Hi Shikhin. How are you? Doing well. Avishek, we would like to learn more about your background. We believe you have some strong experience in data management, and that's one of the topics that we would like to discuss today.

00:46

Okay, great. So I'll start off with a bit of my background. Before starting Logarithm labs. I was at Google, working on the machine learning accelerator team within google. I was responsible for creating the data analytics, and infrastructure stack that was being used within the team, it was called the TPU team which is the machine learning accelerator.

01:15

Before that, I worked at a variety of companies in hardware and semiconductors. And a lot of my technical work was focused on automation and infrastructure for these companies. Right now, we provide data management solutions to companies of various sizes. That's awesome.

01:45

So you have not only a deep technical experience with some large technology companies like Google, but also now you have been running a consulting specifically focused on data management. Give us a couple of examples from your experience.

02:12

On what type of projects you work, the type of challenges that you've solved and your recent work experience. So, most of my customers tend to be early in their data journey. So part of our part of my role is not just on the technical side, but also guiding these companies through what their data journey looks like.

02:37

Starting from, how do you collect data? How do you organize data? Which kind of data falls within the realm of data management? How do you avoid things like premature scaling of your data infrastructure before you start to get value? Also, very much focused on how we enable our customers to get to value quickly.

03:02

That is, that is a big part of it. And it can be, it can be challenging in some cases, the data is not available, or it's sitting in multiple systems. Also, one of my customers is a large manufacturing company, they have a lot of data, very interesting data, but it's siloed in many different systems.

03:27

And to run anything advanced, like, machine learning or statistical analysis. It takes a significant amount of engineering and data engineering effort to bring all this data together. Make sure the columns of your table are correct, you are not missing data, etc. So we focus a lot on the quality of our data.

03:51

Before we start building predictive models and things like that.  Of course, many companies have a goal of doing predictive analytics, and other advanced stuff like, you know, people want to build deep learning models but often like getting data into shape or ensuring the quality of data is good.

04:17

Those are very important things in real-life scenarios. Right, so if I summarize based on what I heard, the top three things that come to mind are the value of data, the quality of data, and the efficiency of taking data and converting it into some value.

04:42

Correct. Yes exactly. One thing to think about is, you have data, and you want to answer some questions based on that data. What is the best way? Also, that you are, in reality, able to answer those questions using the data that you have.

05:02

If you don't then, what is the process you need to follow to be able to answer the questions in a more data-driven way? Those are some things sort of at a high level that we focus on very, very hard. Right. So, based on this recently published viewpoint.

05:23

That you and I have discussed before. It seems like most of the companies, it's fair to say, might lie somewhere between stage two where the companies have few data sets in the data warehouse, with some limited analytics, but generally lacking the central data governance, quality, and compliance. Up to stage three, the companies can aggregate data from multiple sources.

05:51

And have a few functional reports available, but largely, the companies still lack a broad data-driven culture. So that's if that's correct. Let's simplify that and try to understand. In your experience. Who is the right team or person or group in a company that can decide data quality or data availability?

06:24

Is it the CIO, or the chief data officer if it is one of the business users? Yes, so this is a very collaborative sport, right? So there are the infrastructure aspects of data, like what is the data warehouse?

06:44

What are we spending on the data warehouse? How are people accessing it? The governance aspects of it like who has access to what tables? What columns of the table are we using? Like audit logging, those are things that you expect from a data platform. Platform to provide, and I would say that that is somewhere.

07:07

Falls within the CIO organization within a company, making sure that infrastructure for data consumers is good, has good security practices, things like that from a data officer perspective where the chief data officer or business user perspective, There are certain questions that business users or business functions, need to answer to either drive revenue sales, improve quality, any of these are business-oriented goals.

07:40

And the goal of the chief data officer is to make sure that business users, whether it's like a marketing function or a sales function. They have access to the data that they are using to drive day-to-day business. And so from that perspective, working with the CIO organization to make sure that the data platform fulfills the need, the data pipelines are in place, you have the right set of data professionals, you know like analytics engineers or data engineers or data scientists and data analysts.

08:21

There's a lot of overlap in these specific roles, but the goal is the same. Like, how do we enable business users to make data-driven decisions? And up again, like keeping the focus on who the end customer is and what their use case is.  That falls within the chief data officer organization.

08:43

Now. That role actually is fairly new, and many companies need a chief data officer. So sometimes there's an overlap with the CIO organization, but primarily the thing to keep in mind is the infrastructure should support the use cases and an iterative loop between the end users from any part of the organization to make sure that the data projects are delivering value to them.

09:20

On a continuous basis. Right? So this is, this is Real interesting. There's a lot to unfold here. First of all, I agree with that. The chief data officer or any kind of dedicated data role. Which has to be highly collaborative between business teams or across business, teams, and technology teams.

09:45

It is not only a new role. Historically, I have worked with CDOs or CDO organizations.  Only in global thousand companies for any mid-size enterprise.  Let's say revenue from 50 to 250 million dollars annually. Generally speaking, I haven't seen, fully dedicated data officer or a senior manager of data role, kind of lies within the CIO organization. 

10:14

Having said that, somebody eventually, even if part-time, plays that role, Either on the business side or on, on the technology organization side. And so, we cannot deny the fact that Data is a capability. Requires that collaboration, which is what you just highlighted, absolute sense. And this person and this team are also responsible for this.

10:41

Recording the use cases. Just like you said, sales marketing, customer success, and all the other business teams can drive the most value out of the data. And then they can. And then this team can collaborate with the CIO organization to create the right infrastructure.

11:05

Teams and processes, to make it happen. Is that correct? Yeah, so there's a lot to unpack there.  So yes. So whether the end consumers should have a say in what sort of infrastructure is possible, sometimes in larger organizations, the infrastructure has already been selected by the IT organization.

11:34

And for the most part, you know, modern data infrastructure is, is good, like you can't go wrong with picking. Like the big data warehouse or data lake or any of those solutions for any of the major vendors.  Getting caught up in. Getting caught up in.

11:54

Will my use case be supported by this particular piece of data infrastructure that we've Chosen? That is not a productive, productive angle to go with. Now, having said that, there are That could be like certain use cases for which the data team might need to write custom code or deploy some low-code solutions, or Our deployment connectors, and all those falls within the realm of this if a particular team needs a connector to How to let's say stripe?

12:27

 What is the best connector to stripe?  Those sorts of Things, you want to make decisions quickly. , and make sure that those connectors are functional, but, but also not, not spend months a year trying to figure those out, Those decisions out for the most part, things work.

12:50

And As long as the needs of the business users are met, most infrastructure. Will work fine for most companies. , of course, when you get to the scale of Google or Amazon companies like that, that's a different scenario where the scale really is so large that they have to build custom solutions but what we are seeing in the marketplace is that A lot of the effort that these companies, the hyperscalers have put in are now available to f for everybody to use and then Our data plat like specialized data platforms like a snowflake, or data breaks they have either come out of research or just from years of Use of experience building these data systems that they have mostly solved all problems of scale for and consumers, sort of the kind of companies that you and I probably interact with Great.

13:51

So just to add to that. Our target audience. For, this talk, for example, is about What we are calling the medium size enterprises. Let's say from the revenue annual revenue range of 25 million to 150 million. And so, for those companies, Obviously. They are not going to emulate the infrastructure that Amazon and Google of the worlds that have been created.

14:19

And you used to sort of highlight some of the architectural components. seems like most of the architecture data manual architecture for medium enterprises. Which are still growing and want to move towards a data-driven culture. Is going to be.

14:41

A combination of third-party tools and we don't endorse specific tools. But I know you are from Google, and historically have worked a lot with Amazon/AWS, and Microsoft Azure.

15:00

There are a lot of mature, stable, and scalable tools. Is there so beyond the tools? One of the key components of this architecture that you want to highlight to our listeners. That is absolute. Let's call it mission-critical. And in today's business, Yes, yeah. Good question.

15:23

The main thing, the main things are getting an inventory of your data. What data do you have in your system? What system does it live in? What are the access patterns? I mean, almost doing an assessment of what is the current state of data. Even if it's just in the raw, In raw form.

15:44

 but perhaps, there are other tools like some data living in Salesforce or some other system hub spot. They might not exactly. Their dashboards might not show you exactly what you need to see. So you might need to bring it into another system. What are the connectivity components?

16:02

What? How do they map to your data warehouse or your database? , And, and then going into, let's say we have this, we have access to this information Or this raw data. How do we turn this into the insights that we need? Some of those requirements come from the business function, directly.

16:26

 Now, if the data itself doesn't exist, then there should be an effort to put in. Put the telemetry into the data. To actually be able to gather the data and of course, once the raw data is there can data undergo transformation that usually requires a data engineer or a data analyst to be able to make those transformations.

16:53

 And then we go into the realm of automation. Let's say, you're running some transformations. New data is coming in, is this pipeline? Healthy, is this date? Does this data pipeline run on this schedule?  Have we put data checks into place? So that when you're taking, let's say, data from raw data from source A and creating tables of source b.

17:13

So that a marketing manager can look at their campaign. Performance is, have we pushed bad data into this middle system b? Which could cause them to make some decisions with either incomplete or inaccurate data, all of which fall within the realm of data in data engineering.

17:36

 Yeah. So Those are things to keep track of, but there's also a process, starting with, do we need to have all these components in place before we can do anything? Or can we incrementally build capability and improve our data, and data posture? As we make progress, all while keeping in mind that the goal of a data team is to further the needs of the business.

18:09

Yeah, that makes a lot of sense.Recently, for example, We work with a company which, which has been very sales and marketing focused to your point Abhishek seems like they are able to aggregate their customer data across. Multiple channels that they have published their information on, Then able to, then Map.

18:39

Customers' journey. Which is easier said than done. Mainly because if You are a b2b company. And, in this example, this company is a b2b company. And they are selling their products and services to other enterprises. There could be several. People from the customer enterprise who might be visiting your sales and marketing material.

19:04

And connecting the dots creating a comprehensive picture of an account in this. In this terminology of a b2b sales and marketing organization. It's, it is. Probably a really good visualization. For the b2b market, here. So, having said that, suppose the company spends. All these efforts, and you can take any example like this.

19:33

What type of ROI? What can the teams expect after making a certain level of investment? And the data aggregation. Enrichment and seems like orchestration back to. To the business teams. And how do we calculate the rises? The real question is How does it get to that ROI? What would be your suggestion?

19:57

Yes. So, this is very relevant and it's a very complex question. How do you calculate ROI on your data investments?  Part of the reason why it's complex is because of attribution, how do you attribute, and what effort led to what outcome?

20:17

 From your data efforts primarily. The way to actually think about it is to take incremental steps and have Measurements at every point where you're making changes to current behavior. What I mean by that is Some of the ROI could come. Could come from.

20:42

I'll just be marketing folks not having to spend time mocking around with Excel spreadsheets. So that is just a straight shot, straight short saving in time that comes from automation and better data, just being readily available to them, they don't have to go click through 20 different systems, download CSVs, pull it into Excel, and waste their time doing that.

21:12

They can think of how to run campaigns and Which campaigns are performing now.  When we go into the next part, which is We've done all this like that, that actually improves the improvement, the efficacy of my marketing campaign, and how did this data infrastructure that I've invested in lead to that?

21:37

Part of that really is measuring the before and after if there isn't a mechanism to measure the before, that mechanism does need to be put in, and is it worth putting in the answer usually? Yes, do we need to, Do we need to invest in a huge amount of data infrastructure to measure the efficiency or efficacy of these probably not, can get, Can get pretty far with basic data infrastructure.

22:07

So yeah. There isn't one answer to how you measure ROI, that's a very big, wide question. Because not only does it mean, like, what do I mean, and what is r to mean, both of those terms themselves are pretty, pretty vague. The method, the method to actually get there, is to create measurements at every step along the way.

22:33

 About both the outcome, as well as the process. Does that make sense? This is sort of a more big answer, but it is. I mean, there are businesses. Every business has subjective measures. And ROI is nothing. But as simplifying some of those subjective measures, do evaluate between let's say which option is better between option, a and option b.

23:03

So having said that, it totally makes sense, especially since the scope of data management can be So broad and can positively impact so many different functions with competing and a variety of KPIs, is that? The. In a purely mathematical-statistical way ROI calculation could be Very complex, but essentially For business leaders to make a decision about how much more to invest in data management.

23:35

It's a more simplified approach. Can I have a positive ROI? Who's my internal champion? And whether they value, or the investment that we are making. So, fair point in terms of What one can expect from the ROI on data management investments. Yes, another thing to keep in mind is the directionality you alluded to are we having a positive ROI is. 

24:10

Is it for business leaders to like less think, is there going to be a 20 percent improvement in my sales conversion funnel or 15 percent more to think about? How do I measure if these efforts that I'm putting in are having a positive return? Of course, quantifying that is a goal, but trying to quantify that to verify granularity early on can lead to trying to build infrastructure, which might not directly impact the outcome.

24:45

And then, you don't want the heavy upfront investment. You want to keep the heavy investment light, but make sure that you're directionally moving in the right direction. That makes a lot of sense. It seems like You are proposing that in terms of, How things get done.

25:05

The best methodology would be. To take. , baby steps, or basically It's, it's that walk-around methodology or stands. Walk around method, where You prioritize your business goals, and you work towards that. Keep your architecture flexible and scalable, but you don't have to get it big. Big decisions on day one, you can implement all builds on it, correct?

25:36

Yes. That's awesome. So We are coming to a sort of final topic of discussion here. As for published articles, most enterprises want their vision and goals for their data strategy and management to be able to find the direct correlation between their data quality and generating revenue, where data can be the revenue driver.

26:07

Provide more reliable decisions, maybe revenue optimization. Marketing personalization type of opportunities and overall. While I'm taking very specific use cases, mentioning very specific use cases. Here is really just creating that data-driven culture. So, as we move towards that vision, which most CEOs and enterprises want, What?

26:35

What is the most exciting? Trends that you're seeing in terms of, Both capabilities as well as technologies. That. That we should be aware of and it kind of gives better confidence to business leaders. To invest in data management. Yeah, so new.  So a couple of things to talk about here, the new technologies.

27:01

So if you look at, if you google search modern data stack, there are a lot of new technologies. A lot of interesting new players are trying to solve both problems on a sort of holistic scale like the big data platforms from snowflake or data breaks, Or google or Amazon.

27:22

 but there are also a lot of smaller players who are targeted very at very specific things like monitoring data quality. There are companies who focus just on monitoring data quality but on a large-scale production scale. There are companies That talk about data drift, and if you're building machine building and deploying machine learning models on a regular basis, then data drift can become a problem.

27:49

And, and, and the main thing to watch is, like, acknowledge what stage of data maturity your organization is in,  there are a lot of shiny tools out there. Going chasing every shiny tool which is out there in the market can be like, counterproductive to this idea of ROI because you're just stuck in the evaluation cycle of, is this data product.

28:13

Good, is this data product bad? And then, of course, some of it comes from experience like what are things that you can do? , you can delay making an investment from a data management perspective and what are things that you cannot delay or if you put a sort of small scale solution in, In place.

28:35

Is it going to last me like six months a year? Five years. I work with. I work with some smaller companies who started off very, very small with their data.  Data efforts, but the return on investment on those small-scale efforts were huge for them, and that is a big deal, and it doesn't really matter on the revenue.

28:58

On the revenue of the side of the company or the size of the company. What matters is, what does the company think about this data effort, giving me value? In giving me value and culture. Again, it's not something that changes overnight. So building that culture takes time and a lot of that time comes from incremental successes, and It's not one day, because we deployed.

29:29

A big data solution from. The hottest startup or the hottest, big company in the market that are companies suddenly will become data-driven. That's not the way to think about it. We mostly want to target small wins. We want to make sure that there are internal champions who believe that data does.

29:51

Does drive the business forward, and also successes with the end users and end teams. As long as we think of this data function as also being a customer success function for the business, things will move forward and they'll move forward. Well, so those are, those are some things to keep in mind.

30:12

Many companies can spend years, five years, five years, just doing digital transformations, and how do we make our company or organization data-driven? But the outcome from that was that there wasn't enough thought put into how to incrementally, improve outcomes for the end data consumers.

30:40

Those projects last too long and whether they will eventually be a good return on investment. There's plenty of evidence to say that, see that large scoped projects can sometimes fail, and they do. That's a great point. We are at the end of our time. I really appreciate your time and your viewpoint on and your expertise on data management.

31:13

Thank you so much. Well, thanks for, thanks for all those interesting questions. Shikhin, it was great to chat.

Would you like to learn more?

Contact us to exchange ideas