About Data Story Series
Join The Data Story podcast, where two veterans of the Data & Analytics industry cut through the chase and bring you the most relevant technology trends transforming the industry. James Serra and Khalil Sheikh have helped transform several Fortune 100 enterprises into data-driven enterprises. This fortnightly podcast will equip you with the best practices, tools, and frameworks available that will help you spearhead your business insights journey. Stick around for each new topic discussion and subscribe to this channel.
Guests on this episode
James Serra is a Data Platform Architecture Lead at EY, and previously was a big data and data warehousing solution architect at Microsoft for seven years. He is a thought leader in the use and application of Big Data and advanced analytics, including solutions involving hybrid technologies of relational and non-relational data, Hadoop, MPP, IoT,
Khalil Sheikh is the Executive Vice President of Saxon Global. Under his leadership, Saxon is transforming from an IT Staffing and services organization into a new age digital transformation partner and a strong brand. Khalil has extensive experience in the IT services industry and in turning around businesses by promoting growth and profitability.
Season: 01 Episode: 03
James Serra: [0:41]Good morning. Thanks for having me again.
Khalil Sheikh: [0:43]Let's start. What do you see as CDM playing a role in today's enterprises? What is CDM? And what are the various standard definition of common data model?
James Serra: [1:02]Sure. CDM common data model. When I was at Microsoft part of this, and I talked to customers, and they would have no idea what the model was, I always say, Well think of it this way, if you're going to build a data warehouse, say you're in healthcare, and need to collect all this data, patient data, practitioner data, care plans, claims, and you're going to pull it into all the sources into a data warehouse. Well, okay, you've got to create a database and tables and fields. And somebody's got to create this data model. Well, if you sit down and take a piece of paper and start creating a data model to handle all the data, that could take a long time, probably months, you have to think yourself, well, we're in the healthcare industry. And for somebody who's created, done the same exercise 1000 times as well, yes, they have, and they've created a common data model. So you can go out and find somebody who's done the same thing and created, for example, healthcare, you have a patient medication procedure encounter, episode of care versus segment, they have these data models built so you're not starting from scratch and reinventing the wheel. And that can save you months of time because it's also very thorough. And it's the final relationship between all these different pieces of data. And you can make sure you're collecting all the right data. So now imagine, they have this kind of data once for all sorts of industries on there. So we can think of any type of service that can fit into an industry model on there. So you'd never think from I'm an apparel company to I'm an airline to car rental, there's tons of data models that people have built, and you can gain access to them many times, you're free. So anytime you're going to collect up massive amounts of data, and you're in an industry, that's common look for an for common data model.
Khalil Sheikh: [3:09]Thank you, James. If you can go over the benefits, you know, semantics versus simplified. What do you see for the enterprise as who are the right audiences? Or what size organisation healthcare versus BFSI? Can you go a little further on benefits of having a unified common data model?
James Serra: [3:32]Yeah, I would say almost any company no matter what the size can have the benefits of the common data model. Again, same situation, if you're just a team of handful people, and you're gonna build out a solution can use a common data while to shortcut the process in there. Now some of the blockages or difficulties could be are you going to use a model that's costly or expensive, and there are a lot of these are free, even, you can go on the internet. And you can find some of these that have been published as open source. Common Data models challenge could be if you're in an, if you're an organisation that is unique and doesn't have a industry data mile, or if you don't have a lot of data for an industry, because you can have these models that like the healthcare could have, they literally have hundreds of tables or entities. And it could be overkill, because you could say, well, I'm not going to use 90% of this. So I don't really need to go this far on there. But eventually, you'll get to a point where you're collecting so much data that you're going to use maybe half of this common data model and you can find the rest of it on there. But it can quickly prevent you from after the fact going, Oh, I forgot to collect this data in there because the model will be so thorough, that they're gonna have everything in possibly name and you can just map all the sources into the, into the common data model. And they're also, and the biggest thing is defining those relationships, because you could go into like, patient data along with claim data or should be separated out, should I separate out, addresses, or, I put them with the patients, this is all thought through. And so it's built out in the most easy to understand, but also performance wise. And when we get to having a query things, and change data and update data, all those things, the model is taking care of that, because somebody models around for dozens of years. And all this has been thought through. So that's one of the big time saying, come into play, even if you're a very small company.
Khalil Sheikh: [5:49]Thank you. Thank you, James. And how is a common data model related to open data initiative that, you know, got announced 2018 by Microsoft and SAP and Adobe? How, how is it related to open data initiative?
James Serra: [6:10]Yeah, the Open Data initiative was announced, I think, about three years ago, from Microsoft as a joint venture between Microsoft, Adobe, and SAP, to create a bunch of open-source entities, because the problem is, I can, as my club come up with, oh, he's a great model for healthcare. But what happens if all these other companies have their own common data model for healthcare, and now you're kind of defeating the purpose of having come up with a model that's so common. And people can argue over what these models they should use? So can we collect everybody together and agree on one common data model from each industries, I would say it's mixed success, and they've only have three companies involved in it. And I don't know much about Amazon and Google, but they probably have their own common data models for healthcare. And in that you also have other third-party companies that deal with Master Data Management, having their own common data model. So they have like prophecy is a popular MDM solution. And they have their own common data models that are different than the Microsoft and open data initiative in there. So that's where it's a little complicated now. Because you can go hey, great, these healthcare common data models, and you go and do a search, we find those five or six of them, which one do I choose now? Well, can we all agree on one and that's the idea of the Open Data initiative, ODI, and it's had some success. But I'd like to see everybody get together and come on one standard. But that hasn't happened.
Khalil Sheikh: [7:52]Thank you, James. And why the common data model is key to modern data architecture, for driving actionable, operational insight, for any enterprise. What is the relevance?
James Serra: [8:10]Yeah, I would say, the challenge is, if I'm going to try to get insights to this data, and have some cool reports and dashboards and such, I need to have the data laid out in a way that I can say credit star schema from it. So it's easy to map, how all the everything's related. So I can go to a dash of a workspace and I could just drag fields, and create a report or dashboard very easily, without having to understand so much the relationships between all that data. And that has always been the issue with customers. And I've been in industry a really long time and worked on a lot of data warehouses, and usually had one person when it came to reporting that knew how everything connected, and you always have to go into that person and ask them, okay, I need to collect all this data. He's like, Oh, he's doing this and this, and this and this and create some new for you. Well, if we have a common data model, and it's a lot of tables, but then a lot of common data on Sony built in star schema on top of that, which is easy to do, because all the relationships that define, you don't have to rely on that one person anymore. I can go and do self service bi reporting on top of that data, because it's all laid out logically, and the relationships are built. And then also helps out with avoiding querying data that is just wrong because you've chosen the wrong connections on there. And the common data model will prevent you from landing data if you want to do on your own. That is not correct. That has won relationships. And sometimes it's very hard to find someone down the road says Hey, man, this report I turned out wrong. We've been using it for months and it's been wrong. If you look at data mining, creating, like, oh, wow, I have invalid relationships in here, or we get through referential integrity, it's a lot easier to enforce revenue integrity with a common data model than it is creating your own because the common data models define reference integrity and make sure that you don't land a state code in there that's not already in the reference table. So they've thought through all that, so it greatly reduces the amount of errors you'll have in data.
Khalil Sheikh: [10:32]Thank you, James. So it seems like it's not a magical pill, there's a lot of work that needs to be done. So what are the expectations of a common data model, data producers, and data consumers? Because it's, it's sounds pretty complicated, right? building all the digital interconnects and portals and data and everything else. It's a lot of work. So it requires a lot of thought leadership. So what are the expectations that data producers and data consumers may have out of this design as well as implementation?
James Serra: [11:12]I wish there was a magic wand on eBay to go and purchase and just wave it and make it easier there. So a common thing of all is a shortcut process in there. But when you look at the common data models, you may be overwhelmed at how large some of these are, especially the medical field, I think there's over 100 of those entities. So this is where you made it. This is your first endeavour in building a data warehouse in real time data models, get some experts, you don't have them personally, this is where I would say a partner can be helpful that have gone through this process before because the biggest challenge is, okay, I have all the common data models. And I have all these source systems, I have to map those source systems to this common data model. But I've never done it before I can start having invalid mappings. And that is the longest process of building solution is I could have various SAP and CRM and medical databases, internal databases. And if I map them wrong, then it's kind of ruining what I've done with it. So have somebody who's gonna actually teach because it also get challenging in the terminology, maybe? What do I mean by a patient and as a doctor, go into address for patients? Or should they be separate. So you have to have somebody that knows the lingo also. So they make sure they're understanding that the mapping correctly on there, and then also big challenges to cleaning the data, because a common data model is going to want it in certain formats. And the source system is not going to have it in that format in most cases. So how do I clean that? And am I thinking correctly? and challenge is if you have so many companies, I've seen they have dozens of SAP systems. And unfortunately, they've all stored data a little differently. Maybe they put the full name of the state, maybe something something's deviations. So, you have to have expertise and understanding how to conform all those and put them in there correctly. The mapping process on there that can be very challenging, you actually reminded me certainly. And that's where you need to have that expertise and have somebody who's maybe done enough.
Khalil Sheikh: [13:25]Thank you, James. So that poses another question, right? Taking healthcare, your example. You have clinical information, you have hospitals, you have pharmacies, you have care coordination, you have outpatient nursing care, hospices, you have multiple sources of ownership of data, what challenges you come across from an ownership governance compliance as you're building this common data model? Because if we look at traditional systems, there are multiple touchpoints, multiple ownership, how do you resolve it? It's much more complex than just technology implementation.
James Serra: [14:07]Yeah, that's a good point, it's ownership is, by far the biggest challenge I've seen in my experience of building data warehouses is who owns the data and who was responsible for cleaning the data and verifying that it's correct. And I've been in conference rooms where people don't get in fistfights on who owns the data, to some people are very reluctant to give up ownership. As you're saying, I want to copy this data into common data model. And the IT department is in charge of the common data model and they now ownerships of this and are they going to clean it? They know, are they going to clean it correctly? Or are they going to push back and say that they didn't clean it and then come to the source and ask them to clean it. And this is where we get into some of the data mesh concepts that is previous conversation we have is that could be one solution for handling ownership. Now that's got its own other concern that the whole other different problems you can have on data mesh, but when you get to centralising the data, that is the concern is, who do I go to? If there are problems with data on there? And if I have 10 different source systems that are all SAP and I'm putting it in one common data model? I've taken on that, or am I going to ask one of the people on SAP? Or is it going to be combination of ownership, which can get really confusing on there? So what I see most customers do, they have some kind of Centre of Excellence. So they have multiple representations from each of the groups. And they have I won't say arguments, but polite conversations on agree who owns this data on there. And so this becomes, to your point, much more of a people process conversation than a technical one. So I've had companies come to me and we talk about a common data model. And then eventually, they want to come back and say, Can we just talk about the roles and responsibilities of everybody in the centre of excellence, and we've had conversations, just on creating a COE, we know people involved in those responsibilities on there, because I don't care how great the technology is, and how technically skillfull people are building it, if you can't get people to agree on certain things, just it's gonna fail. So think through all the people that you want involved in this and creating some sort of pseudo room. So you make sure that not having challenges and you're halfway through this and people start complaining that I don't like the ownership or you're not doing it right, or I should be one thing it you're messing all up. So avoid those by thinking through that before you get started.
Khalil Sheikh: [17:07]Thank you, James. And, and how exactly does it accelerate the overall analytics journey of the Enterprise's, right? Because if you think in terms of it is easy said that you can fill the gap to common data model, you know, destroying a data repository, cross platform, applications, touchpoints and everything else. But how, and what does it take for an enterprise to accelerate it from an actionable insight perspective, right? Because ultimately, not just a transactional reporting, but, you know, operational real time predictive reporting perspective, what does it do? What kind of value or competitive advantage that it brings within the analytics journey?
James Serra: [18:00]Yeah, I would say it's, it's a lot about the standardisation. So instead of having multiple reports, interpret data differently, the common data model really forces you to have one particular truth of the data on there. And that's where that mapping becomes so critical, because it's all landing in one spot, that single version of the truth. But this provides, without the common data model, if you go to the source systems, you can have people find things differently when they go to query it. And I've seen a lot of crazy things happen where people get different answers to the same questions because they've changed the queries or done or cleaned them differently. And sometimes I clean them definitely to make their organisation or department look better than others. And you have these arguments. And that's where you set a lens where you go, Well, wait a minute, we not only need to standardise on the data, but we need the standardize on the formulas and calculations we're using. So that's again, another part kind of outside of CBM is we got everything in there. But if I need to calculate the net, do I take the gross minus the cost and the taxes? Or do I not include the taxes and all that on there? So what I think companies do is they kind of expand out from the common data model. That's their starting point. And then they build definitions around it of what all this means to because what do I exactly mean by a patient? Or what do I mean by things like a conference room that I need to talk about costs. And so what all kind of comes out of the common data model, there's also a glossary that gets together and it's Let's all define take these definitions of all these terms. And standardise on those In addition, common didn't want to kind of forces you to do some of that, because their tables are only gonna call things one way. But people can interpret those table, fulfil names differently. And so you have to have that also that agreement on to. And so you can think of common data model as forcing you to conform as a company with understanding your data better. And having that single version of the truth where the data stored in a single version, the truth of all the reports that are generated. And that's where you get into some of those arguments, because you may take away control from those end users recreating old reports before and finding out people are using the same data but creating different formulas, I'm centralising this and maybe it is being the one that's going to stamp of approval of this is the formula we're going to use. And everybody needs that same formula instead of doing their own thing, and may not make good answers that they don't like, because it's not the way they need to find a formula, but at least got standardisation. And I've seen people on some extremes, take numbers and have different results and report those numbers to Wall Street that were completely wrong. Because somebody changed the formula to make them have a bigger bonus because it looked better by excluding certain things. And I've had I've been yelled at when people said, How can you use this particular formula, and I go, Well, we have 10 people all using different formulas. I went to the CEO, and he said, use this one. And that's what I'm doing. So why he only got me through it. But you need to resolve those issues. And you need to have that conformity where else you have representation on the data.
Khalil Sheikh: [21:52]Yeah, very good point, just making a comment at what we have seen in healthcare and financial banking and financial services, it's like agreeing on that holistic view of the enterprise is very important to having that steering committee, power users who agrees on the overall outcomes, before you start building the blocks, otherwise, I have seen one of the wealth management company I was recently working with 500 some different views available, and none of them talks to each other. Right. So equity versus hedge funds versus, you know, various bonds and wealth management and all. Same thing we see it in healthcare all the time. So the overall design and outcomes becomes very important. Especially leadership is speaking the same language, because it's the onus of data by a certain group of people or departments. But then having there is a cross pollination, which is equally important. So good, good conversation, who are the major players, I have been working with IBM, essentially, on the body side of it, and they have a common data model. And Microsoft has a data model, who are the major players, number one, and how does data was Microsoft data was play into that space?
James Serra: [23:20]Yeah, the major players that I know of are Microsoft and in combination with SAP, and Adobe, and I don't know so much about Amazon and Google, what they're doing, you can probably comment better than I can. I do know of a company that Microsoft bought recently called ADRM had has a bunch of industry data models, I think over 100 of them, and Microsoft is incorporating all of them into their platform. So people can have those standards on there. And what I also find is in large companies like the one I met you Why is they have their own common data models you can imagine they haven't there's for tax platforms in accounting. And what they do, what I find other large companies doing is they will kind of go out and find all these common data models. And then they will make their own adjustments to it and come up with their own common data models for their company on that. And sometimes they will look at multiple common data models for finance as an example and come up, pick the one they think is the best and make some modifications to it. So it's unfortunate that there is not one end all be all for the common data models. And there's also you have to think of competition and whatnot, that flows into it. But I think at the end, no matter what you pick, they're gonna be sprinting, probably 80% similar. Because a lot of times, they may call it something different, but it's all based on somebody else's on there. And the biggest thing is relationships are usually very similar that the tables, the fields with them, maybe a little different, but they tend to stick to the same naming conventions for the most part on there. But again, there's no magic one that makes this really easy.
Khalil Sheikh: [25:34]Where do you place data was Microsoft data was a situation quite a few. Even recently, when I was working with IBM, on Tivoli, and common data models that IBM was building upon versus SAP, and Adobe, everybody has its own context and set of application side that we are trying to build upon. Where do you see data was playing out among all of them, and you know, in terms of people process, methodology, even implementation tools, technology that comes across it?
James Serra: [26:09]Yeah, the data verse was a rename of the common data service. Microsoft loves three main things. And I can't say I'm very excited about the main data verse. Very confusing what that actually means. I like to think of the common data service or data verse as a combination of the common data model, and the services on top of that, that enable you to access that data. And so if you look at data verse that's inside of dynamics, that product that Microsoft has. So if you're using dynamics CRM, so you can imagine, need to use a various tables or entities that deal with people and addresses and such. And so underneath the covers is the common data model. And to access the common data model, when using data, data dynamics is you're going to use data verse as that technology to access that. And Microsoft is making that common data model and dynamic, something you can export out into a data lake, and then start using it. And then you can access database technology to access the data sitting in a common data model. Now, I would say it's, it's more focused on CRM type of data inside of dynamics. And what I'll see customers doing is, well, could they do everything they need inside of dynamics, and use the data verse. And that's fine if you need a customer 360 view. But if you need to take that data, and then combine it with way different types of data that are outside of customers, I need to take that data and join it with operational data. And I don't want to take all that operational data and maybe financial data and try to ingest it into dynamics, instead of just take that customer data from dynamics, put it into a data warehouse, and pull in that other data and make that my area that I do all my analytics on. So that and then you'll see a lot of Microsoft products now have interfaces to the data verse, like no Data Factory, and I can say, I want to use the common data bars to pull that data out of dynamics and landed into a data model, or maybe it's in a common data model. And I'll use the data verse technology on top of the common data model and the data lake in order to pull it out and understand it all. So that's kind of how I view the data verse. But it's Microsoft's way of doing things in it. It's just dealing with Microsoft's common data model and dynamics.
Khalil Sheikh: [29:15]Does it extend beyond dynamics? Because in Dynamics, you have CRM and ERP data, but Microsoft other products like SharePoint or your collaboration, you know, team and others? Does it extend this data model underlying data model? Does it extend to other Microsoft products? Or is it limited to dynamics?
James Serra: [29:40]I can't think of a use case where it will. So the data verse just one within dynamics. Now you can pull in pulling data from those other sources into dynamics, as I mentioned, and use that as part of the data verse on there. But at some point, I would say let's not jam everything into that and dynamics, let's create that separate data warehouse for it. It's, I think a data verse a sort of a mini data warehouse. And you can only have to do so much. And if you start trying to extend the boundaries outside of customer data, it's probably better to create that separate data warehouse and use dynamics as a source into that data warehouse.
Khalil Sheikh: [30:31]And then what is adoption like for common data model? I saw a lot of healthcare organisations, as well as some banking, larger financial institutions started to build upon it. In the last couple of years, what have you seen in terms of adoption rate, what percentage of customers are using common data models, especially in these two verticals? What has been your experience?
James Serra: [30:57]The adoption rate is 14.29%. And I'm just kidding. I would say I would see it. Maybe one out of every 20 customers I would talk to at Microsoft will be using a common data model that they got from Microsoft or some other third party. AD CRM was a popular one. For people pulling models that were outside the handful that Microsoft has, I think, might have had maybe a dozen or so models right now. And now we're integrating most of them. I see a lot of any kind of service industry because I guess that's more straightforward. And a common data models in there and and i think adoption rate is not as great as I wouldn't have expected just because of a couple things. One is Microsoft having limited amounts of in the Microsoft world, you're waiting to get in your CRM in variable in that. And once that happens, you can see greater adoption. And the other is the common data model. And Microsoft will focus mainly on having the data in a data lake. And it's challenging if it's in the data lake, because it's in a common data model format that has its own way of laying out data. So you have to have all these tools that are and understand it all. And then you may not want it in that data lake, you may want it in a relational database, in which case, we're kind of out of luck until recently, number of Microsoft is working on that more and I can talk about it. So it's that having it in relational format, and also making it easier to have a tool on top of the common data model, so you can customise which Microsoft is working on and make it visual. And that's what's coming up soon from them is a integrate within synapse is a much easier way to take a common data model and adjust it because it's gonna have a lot more fields than you need. And I need to wait to delete some of those fields and maybe add a few customised ones. And maybe I don't need these particular tables or entities and delete them, and then come up with a model that I can just publish and then go right to the data lake and go right to the relational database. And so I think adoption has been slower because of a lack of an easy to use interface to customising the common data model it you can do that within dynamics, but you have to be using dynamics to get that and most customers are not. And if they want to use it for the medical field, well, we're kind of out of luck until it's a lot more manual work that's going to be done until they come up with new tools out making it more automated.
Khalil Sheikh: [34:03]Thank you, James. So how is if we talk about Microsoft, a specific common data model to Power BI data flows? How is it common? Is it pretty common for people to adopt in that order that Power BI data flows and common data model? And what are the advantages that you have seen there?
James Serra: [34:27]Yeah, think of the common data model when we're talking about Power BI data flows allow you to take data from a source and clean it and then landed in some of the Power BI data flows are what I'll say our producer and consumer the common data models and that's something that's fairly new in support, and that's where it's making the common data model and get more acceptance because now I can use Power BI data flows. To take data from a source and write it into the data lake in a common data model folder format, and then I can use data flows to read that data and pull it into some made from other locations. So in short, the how guide data flows can read and write the common data model as it sits in a data lake and then down the road, do the same thing. When the common data model ends, in the future in a relational database, like a synapse, dedicated pool, so I can use that for moving right to healthcare data flow and flow that no power in data flow, it's it's easy to use ETL tool to pull data and transform it and clean it and lay that somewhere as opposed to an IT tool of Data Factory which is used to clean it. So data flows can be a self service ETL tool. And I've seen a lot of use for that for those end users. We're not IT people, and they want to take data and clean it. And they want to use it in a common data model. Now you can do all right data for us.
Khalil Sheikh: [36:16]Thank you, James. James, can you think of a use case, whether from a healthcare perspective or BFSI industry perspective, where you have implemented a common data model, and you were able to leverage the value of it? Because it's, it seems easy to talk about it, but it's more about people process ownership. And then ultimately, the tool part of it is easy. The technology implementation maybe not as difficult, but resolving and getting people to the same level of understanding evangelising, the value and concept. Can you talk about a use case that where you thought it was difficult from overall ownership perspective, you know, we talked about multiple healthcare and BFSI case studies in the past, pick anyone that is favourite to you from a complexity perspective.
James Serra: [37:16]I've seen probably more use in the healthcare field than any other industry. And one particular company that was in the healthcare and they were sort of taking data from companies that would hire them as a service. And they would pull all this data together, and they were trying to look at the data to and use machine learning on top of that to cut costs or save costs, I should say, for example, they would take all their employees data, and there's some privacy issues that they were had to make sure they're lying with, but they could look at all that data and other employees and kind of put them in categories of high risks, and then tell the company, hey, here's a list of all your employees, you're the one of the higher risks, and you may want to go and do some preemptive communication to like, offer them free health care screenings, and if you know, the healthcare world, it's a lot cheaper to go and catch something early than having to hospitalised so they were all about preventive care. And avoiding people go into the hospitals. And so they need a lot of information for that. And when I was consultant engaged with them, and they had early railroad, I was trying to build their own data models. And they also needed to master this data while the companies would give them their employee data from different sources items that would be sometimes have the same employee and multiple ones. And so they need to master too. And so in their case, early on, I said that if you're going to build a data model, man that's gonna take you forever, and then you're gonna roll your own mastering tool to clean all this data and create records. I said, you should start investigating some MDM solution encore prophecy before when they wanted to pick and prophesy because they had MDM tools and then as well as the data models in the healthcare industry, so it really should shortcut the process of them. And they also hired a consultant called me that knew mastering data very well. I was kind of part of that. And so they have experts who have gone through this before and help them with the mapping process and then help them with customising and simplifying the code. data model to their needs. And I would say, I think they estimated saving at least three months of work, not just in building a common data model, but having to go back and correct or fix things that they probably would have forgotten as you've gone along, because you're not you call your own common data model, you're going to forget things. And you're going to build a system out and be like, Oh, I forgot to collect that information. Where we defined this relationship, that was the biggest thing, they were going down a path that, oh, man, they were going to some really wrong, many, many relationships. Common data model got rid of all those problems in there and really shortcut the process. And so then paying for third party products and consulting service, why not saving them a ton of money in the long run of trying to do this on their own?