The Data Story First Episode | The Alphabet soup of data architectures
Podcast

The Data Story

The Alphabet Soup of Data Architectures

Season: 01 Episode: 01


Download Transcript( as docx )

Podcast Banner

About Data Story Series

Join The Data Story podcast, where two veterans of the Data & Analytics industry cut through the chase and bring you the most relevant technology trends transforming the industry. James Serra and Khalil Sheikh have helped transform several Fortune 100 enterprises into data-driven enterprises. This fortnightly podcast will equip you with the best practices, tools, and frameworks available that will help you spearhead your business insights journey. Stick around for each new topic discussion and subscribe to this channel.

Guests on this episode

James Serra

James Serra

James Serra is a Data Platform Architecture Lead at EY, and previously was a big data and data warehousing solution architect at Microsoft for seven years. He is a thought leader in the use and application of Big Data and advanced analytics, including solutions involving hybrid technologies of relational and non-relational data, Hadoop, MPP, IoT,

Khalil Sheikh

Khalil Sheikh

Khalil Sheikh is the Executive Vice President of Saxon Global. Under his leadership, Saxon is transforming from an IT Staffing and services organization into a new age digital transformation partner and a strong brand. Khalil has extensive experience in the IT services industry and in turning around businesses by promoting growth and profitability.

Previous Podcasts

Season: 01 Episode: 01

EPISODE TRANSCRIPT

Download Transcript( as docx )

Khalil Sheikh: [00:00:05] Hi, good morning, I'm Khalid Sheikh, EVP of Solutions at Saxon Global. Saxon Global Inc is a data management and analytics company that helps organizations become more insightful and gain a competitive advantage by having access to actionable information to make real-time decisions making. And with me today is James Serra, and James is an industry veteran in the data management space. Today we'll be talking about alphabet soup of data architecture, which includes modern-day data warehouse differences between data mesh, data fabric's as well as we'll talk about how does it play a role in today's business intelligence? Let's get started. Welcome, James.

James Serra: [00:01:08] Thanks for having me. I like the term industry veteran stuff, except that instead of this old guy. I have been a data architect for 35 years and am now a data platform architect lead at EY. I am here for about six months and before that, I was at Microsoft for seven years. And before that, I have a long history of eBay and as a consultant and using a lot of data warehouse and database and over many, many years.

Khalil Sheikh: [00:01:44] Thank you, James. So we'll get started. James, my first question is, how do you see the modern EDW versus Data lake and how do you see the cloud playing into the today's data warehouses and datalink?

James Serra: [00:02:05] Yeah, sure, it's interesting how things have changed over time, and it used to be everybody had a data warehouse in the relational database and that worked great for many years and scale. And when especially when they got that MPP. technology, multiple parallel processing, and things like that till big data entered. And then over time, when the Internet came around with a lot of different types of data and big data definition was completely reworked to be a lot higher scale than we've seen previously. It also came to play different types of data structure and semiconductor and structured data. And so it was very challenging for companies to invest all this data and transform it and make better business decisions, but couldn't implement that. And so this is one of the data concepts that came out. And the idea is like a window on the data lake and a scheme. And then put computer on top of that and transform that data and we can do a lot of analytics and machine learning a lot of machine learning models and a lot of predictive analytics on that data that sits in the database. We also move it to a relational database that has some additional functionality and features of relational database performance. Data security is another one.

James Serra: [00:03:43] And being on schema right means it takes more time to put the data to work. But it also matched the metadata with the data. So this means it's a lot easier for users to query that data and as opposed to having this file system. And it was really interesting to watch the data that came around about 10 years ago that people thought this would be relational databases. And in this place where you have. Rainbows and unicorns landed in their own backyard, pops out, and it's just what we need. But when I was working for Microsoft at the time, I saw many companies fail and they tried to use just the data lake and replace their relational database. So then it became popular to have both the data and relational database. And there are many reasons for using a data lake. And I'll just pop up some of those reasons on this deck that I have will make available for everybody to view I'm disabled. For me, that's nature if you want to change that. I can carry on all this talk through a couple of things that are related to creating data like one of which you can think of. The problem you had with the relational database when you get all this data, was you clean the data, maybe you have the maintenance window, and I'm going to knock people off the system.

James Serra: [00:05:11] I'm going to land the data on the transform all back on, which is very challenging, and especially when a database, one of twenty seven in there or something happened with your maintenance overran it and such and such data came out as just one big reason to just use it as that area where you can do all the information that stayed in here instead of being a relational database that the data that allowed me to do all those transformations outside of the database, not cloud users, and then instead of extremely small something that's of the data. So it may be separate I mean, that's more than just a few minutes. So that was a big reason to use that electronic relational database. And there's also much more cost-effective compared to compute on a relational database and also in allowing the separation of the data and treatment. I can fire a massive computer. I want much more than a relational database. So if I'm willing to spend the extra money, I can fire up all these clusters of compute and run it on that database. And that I will have the best of both worlds where you can do things on a data lake for reporting, but also do it in the data warehouse for additional features. So that is where we got into the modern data warehouse that using Data lake.

Khalil Sheikh: [00:06:40] Excellent. Thank you, James. There's a lot of confusion in places like now data. What is the value? How do interplays can make the best use of it? How how do you describe it based on your experience? How do they differentiate and what is the value in the organization enterprise can get out of it?

James Serra: [00:07:03] Yeah, there are so many buzzwords going around now, and it's almost comical in that a lot of these buzzwords are not new technologies are not always ways to introduce a cool name like a database to put on technologies that we've had in quite some time. When I talk to customers inside and I can explain all these buzzwords, what I say is think of what we like to explain with the modern data warehouse and if we want to go and add additional features to that modern data warehouse. And some of those features are defining having good access to data, some data policies, we get into our back and back when we want to create a metadata catalog, when we want to use master data management, data virtualization, maybe we want to build things such that we have building blocks that can be used by other applications and add additional features on there. That's when it becomes a data fabric when it has a lot more ability to pull in, much more types of data, and transform it into these things. So I think data fabric is a sort of a glorified modern data warehouse. Now, I to say that this doesn't mean my answer is completely right and there's different. There are different takes on what all the buzz words mean, but I'm trying to put in the context of what I've seen and what customers I've talked to understand. Well, think of a better Fabric as that kind of an evolution of a modern data warehouse.

James Serra: [00:08:52] Now, the other buzzword that we hear is also the data lakehouse, the combination of a modern data warehouse and data lake the idea which came about through data lakes is, well, maybe we don't need a relational database. Maybe we can add features to a database to make it do everything, a relational database and the feature that was put on top of data for that. That to happen is called double like additional features on a date, Alexis's asset compliance and need faster speed on there. And then they also have to think about time travel, where you can have snapshots of data enabling developers to access and revert to earlier versions of data for audits or rollbacks, or to reproduce experiments. And so this is more than the relational database. And so some use cases are coming out where maybe we can now get away with just using the data lake and not have to have a relational database. And I don't think those cases are valid, as some people believe. But I start seeing some instances in particular when you look at some of the latest technologies that have come out, and for example, at Microsoft, they have adjusted EPS and they serverless pool, which allows you to query data sitting on the data and it could be available and query and with this pretty view on top of the data sitting in the datalake like the metadata. I make that available, so it seems the end of the interaction, a very relational database that's really detailed.

James Serra: [00:10:34] So some use cases now for that data lakehouse, you know. Now, the other one is the data mesh, and this is more of an organizational change than just a technology change. And the idea of what the data mesh is, everything that I've been talking about is centralized in the data. It's copying it into one location and that has its challenges on there. One of them being who owns the data, if I copy it into the central location, is that it's now own it or is it still the various domains that have data in there? So the idea of the data mesh instead of centralized data to keep it within all these domains in there. So maybe you have an HR and a payroll and some operational data and all of those, what we call domain's think of them as data, as a product. They each own that data and they take that data and create an analytical portion of that data for their own needs. And therefore, it makes it easier to scale because now each of those teams are responsible for creating their data in an analytical format and instead of central I.T. being the bottleneck in that. Now, what centralized TI does provide quality to it will give us some governance over all that data. We'll ask each of those domains to follow some contract. However, I want to clean, transform and master, and security wise handle all that data on there.

James Serra: [00:12:15] And so now you have this mesh of all these domains in their handling their own data. And if somebody needs to comply, combine the data from multiple domains, then that's when that's what gets challenging, is they would have to then pull its data out of the domain, combine it. And get the at the very point that they're looking to do in there, so they've got some great ideas, at least in theory, on how the datamesh can help on thereby having decentral environment, by having less of copying to a central location, by using data out of the product. And then each of those domains has its own engineering team. So they are able to scale out and this is more data mesh coming into play. But it's only for specific use cases where you're a company that has a lot of data, a lot of the needs. But if you just have a handful of domains because they didn't require a lot of homework and a lot of organizational change, it would be overkill. So if you're a very big company, that doesn't make that case for data mesh. And this is by far the biggest buzzword I'm seeing lately. And I'm trying to educate people through presentations like this to talk about the differences between all these different buzzwords and then maybe have more guidance on what's the best for your particular situation and your company.

Khalil Sheikh: [00:13:48] Thank you, James. What do you see as the maturity level for the organization for selecting one versus the other, and how do you see that? It's not for everybody. Like you have to be a fairly large size organization with the complexity of the data. Maybe having multiple acquisitions and all that you see is the maturity model. If an organization wants to do an assessment that they want to go beyond the existing data, we have some data that is sort of like limited in its capacity. Is there a way to assess the maturity model existing today? And what size organization it should be in order to even think through this complexity.

James Serra: [00:14:38] Yeah, I wish it was a great flowchart or some assessment to let you plug a few numbers and it will tell you what's the best solution. And it's interesting because a lot of it is these questions I asked the customer. To help me guide them to the right architecture, put them to use. One of them is what are your current skill sets? And if you focus all your team on mainly a handful of skill sets, and I'm going to propose an architecture that's going to fit best for your current. Skillset, unless you're willing to spend a lot of time to train anybody in some other skill set, and so you have to ask other questions like how much? What's the size, the speed, the type of the data? Do you have real-time data coming in? This is a big one, training data. What's the end goal? And that's the biggest thing. Ask them, what are you trying to accomplish? What is it as dashboards, reports, machine learning? And let's work backward from that. So you tell me what you need and sometimes that involves presenting to the customer possible technologies that they're not aware of because you don't know what you don't know. So if you all you do is things that Excel spreadsheets and I ask you, what would you like? It's going to be some variation of Excel spreadsheet and then you go to show somebody's power BI, machine learning.

James Serra: [00:16:04] I have never seen it before in their minds become law. And they go, wow, I know this is impossible. And then you start brainstorming different ways of taking this data and making it available to make better decisions, to get more insights into your company on your own. So there's a whole list of questions and it'll be in the collection that you can see. Some of those questions I usually ask and then that's going to guide me. And so I would spend time with customers the whole day whiteboard and drawing out some of these concepts before I even put products just like this. Talk about every detail and direction appropriate for you and how much data is that. And you've got to think of other things like, well, there's a lot of data and it's on Brand. How am I going to get it to the cloud? Do I have the pipeline to do that? Can I break it up in little chunks during the day, those kinds of things, but we'll be fine for a lot of companies when I talk about a maturity date of maturity statements, I usually have four stages.

James Serra: [00:17:09] One of them is stage one is kind of reactive. You just trying to get this data and, you know, locally managing it and then you have all these independent silos. So that's a big problem. And so the next stage and we're most companies ready to kind of centralize all this data, whether they put it in a lake, whether it's on time in the cloud of what is on the cloud now, it's just scattered all the location. And that will help us when we have this rearview mirror look on their heads. We want to make decisions based on what's already happened and historical trends that we've seen, some patterns. The next stage things where we want to be connected, analytics. We want to be able to take that data with advanced analytics on it. So we want to start doing machine learning on their own and do predictive analytics. So not to see where we've been, but where we're going to predict things like when the parts in a fail or when we're going to lose a customer and take react, take proactive action instead of reactive action. And then we get the state for where it's a transformative stage, where we're trying to and to talk about digital transformation. This is the stages and they're trying to be able to build a solution where we can take any data on the side of the speed and scale it up in there and do that historical and that really and predictive analytics on that data to drive those outcomes that we're looking for in that.

James Serra: [00:18:39] And that gets into just building a solution that's going to last for a number of years by spending a lot of time upfront to understand what our data needs. And then the data we get out there. And it's a long journey and it never ends. And you're always thinking of better ways of doing it. Typically in the solution that ideal customers, once you present, when you give them that first taste of new reports and dashboards that they can do, they just go crazy and they want more and more because they know the art of the possible and they're showing them that. And then also they want more and more data. And then you have a big pipeline of trying to fulfill all their needs and requests, which can be challenging. But that's great because you wind up building solutions that can save companies millions of dollars by having that predictive analytics in addition to the historical reporting that they can invest.

Khalil Sheikh: [00:19:36] Thank you, James. So how do you see what are the challenges because like let's say even if you have a modern-data warehouse, data lake and they can even you put the data fabric of, what, third party solutions and all, what are the biggest challenges that you see in enterprises when it comes to predictive analytics or actionable insights, for that matter, when big data analytics? What do you see as the challenges because everybody wants this actionable role-based insight that they can act upon based on their definition of business intelligence? Where do you see the gaps? Even if you have this magic wand where you have built all this data fabric, how many organizations are able to meet that challenge, number one? Number two is what is the outcome out of it? What is the percentage of organizations that are able to meet that?

James Serra: [00:20:34] Yeah, I sometimes talk with customers that go to eBay and try to find a magic wand, that'll be the best way to build a solution in there. Because the challenge I believe it is not so much in the technology. Now there were many cases where I saw companies choose the wrong technology. Maybe all know SQL Server and we try to build everything, that SQL server and what have you some most simple solution might be and then go on. What is it? No idea. So I always felt that my role as an architect that Microsoft was to just make people be aware of all the products and the high level in these cases. And that's why I was my blog for to so, you know, all the tools available, they're actually much more likely to pick the right tool for what your use case. Where I've seen a lot of projects fail and having done this probably in data warehouse in about 25 years of my 35 years, and it was the people in the process that they it's challenging to find people who know this stuff really well. And that's why I would recommend consulting companies. If you're a big company, small company, mid-size company, and never built anything like this, find people who've done it before and find those experts who guide you along, whether you want to have that consulting company do everything or when you want to do what is important there because companies will fail when they don't realize the effort needed especially from data governance. And that's the biggest challenge with customers. And when I see them and I put a project plan and they only have a couple of weeks in there for Data governance, I say you need a lot more for data governance to be there. They find out there are always loopholes that people found in the entry systems that allowed them the data to get put in there when it's should know. And then you got to clean it. Well, you can tell them these are the loopholes and they'll fix it, but that doesn't help the data in. There is a lot of data governance. And then there's Master Data Management when you have multiple customer data in there and sometimes the same customer and you need to merge them together and find those records. And the important point to make was to work with you. And you just from the beginning have been part of the process to make sure that they are getting their input. So you're not just coming up with solutions.

James Serra: [00:23:16] I hear it isn't going to shut down the road. Instead, they feel like they have some skin in the game, like taking their input. And also you can find mistakes and inaccuracies early on. The IT will go and build a solution, give a report to the customer, and then the customer looks at it. And wait a minute, why are these two people separately? They're actually the same people coming in there and just pop in right away from the beginning. The first impression was they lost confidence in what, you know, make them part of a process and spend a lot of time in governance early on and say, look, this is accurate. We're testing it out, let us know. So they feel like I can when it does come out that they are cheering for it. And instead of seeing it as something that's difficult to learn and people hate change. And so you have to understand when you present a new way of generating reports. And that's what even though it may be a hundred times better, they're resistant to it. So get them on in a process of the not so distant present would be training the new way of doing things and people are reluctant to change. So you take those steps. And I had it was actually a mix-up.

James Serra: [00:24:33] I spent a lot of time with customers, sometimes just talking about layout of their teams in the different roles and responsibilities they have on there because many didn't weren't sure what to do. And again, a consulting company can help with this. But the idea is to make sure you put the infrastructure in place for the people in the process. So what you're building will be successful. And that could be maybe in a center of excellence and maybe put together these little teams that work on pulling in the data and understanding and learning with data. Because I've been in meetings where it's almost come to a fistfight when people decide who owns the data responsible for cleaning it off. So create those environments and those teams, that avoid having a lot of those conflicts. And there so when you hear her say 70 percent of technology fails and it's not a lot of kind of it's not the technology of the concessions to people in the process. And that's why I say data mesh is not a silver bullet in there. It's not going to suddenly prevent these projects from failing, because if anything, it may increase. And so it's a lot of organizational change. And so any concept you come up with is not going to be successful unless you put those right people in the processes in place.

Khalil Sheikh: [00:26:00] It sounds like a lot of work, right, so people process communication technology, all have to be coming together in order to build a good outcome-based delivery. Right. So what do you think is the size of the organization should be? Because for a small to medium size, to large enterprises like Googles of the world or Facebook of the world, like they may have the resources, we now see that you know, what size organization is fit for, as you describe people since communication and technology versus, you know, just to ad hoc based delivery.

James Serra: [00:26:44] Yeah, and it's interesting that the extremes and why I'm working on its data fabric and there's a few hundred people involved in building, it's it's at a scale that I've never seen before, even inside Microsoft. And so there's its own challenges of that. And then there are these very small companies, maybe just a few hundred people that are looking to pull data and some of this SAP and other CRM system in there. And I think. It goes back to what did you steal and who are the people that are holding this out? Do you have an IT team or each of the organizations trying to do their own thing? And this is where you get into the technology differences and you can look products like Power BI and you can do everything, pretty much everything in Power BI. It could be a self-service ETL tool. So it can be used very easily by a power user to build a solution and a quick win. And I've seen companies, smaller companies, just use power for every bit of that. Good. You can clean the data that transformation they like and build the dashboard reports of that. And that's automatedML it's it's an amazing product on there. And you can never have to go outside of Power BI, not if you want, but you're not thinking about enterprise level. And so what I differentiate is a very small company and you don't really have any enterprise which just a few hundred people in there, you can use something like PowerBI and get something very quickly out of that.

James Serra: [00:28:10] And it's always important, even if you get a large company you want with when no one has the old days, the waterfall approach where you go away for six, 12 months before you finally have something with even the larger companies, you can use the PowerBI for the prototype to get a quick win. And fortunately, now you can review a lot of what you build in power and then you move to something like a big factory on there. So you're not wasting all that time. And I have seen the larger companies now go to the end-users instead of sitting them down, what are requirements and documenting all that will go. This you be one power guy, a prototype. So we can see it firsthand and we can better understand what you're trying to do. And maybe it will help you build in something very quickly and those who become our business requirements on there. And so and because I've dealt with so many times, these requirements go back, build it, and they go, well, that's not what I meant or that's not why I'm on it and start all over again now that they're doing that and you get a much clearer picture of what they're trying to accomplish on their own.

James Serra: [00:29:14] So as we get up to companies that are small and midsize, then they may have their own I.T. department and then they may want to use enterprise tools. So that's when instead of using Power BI for ETL, they use something like Azure Data factory and that sort of data, they would use something like every Microsoft Synapse there. And then that's when they start investigating need outside help and a lot of depends on what your timeline is, what your timeline and what your budget and what your current skill sets, and those are the three main questions that that would drive the approach. And also, there are a lot of cost savings that I can call as an architect the best approach and a customer may go, well, that's great, but we're willing to sacrifice some performance and save some costs. OK, let me come up with a model architecture, and that happens all the time on some of the smaller companies that are more cost-conscious. So you may come up with other solutions that you would need for a larger organization level of budget. And it's hard to put on a chart where all these lies that's why go and ask those questions to customer and that guides me into what kind of solutions? I think that.

Khalil Sheikh: [00:30:32] Thank you. And how do you see AI, ML, recommendation engines, feeding back to the system of choice, right? How do you see that is playing out in this outcome-based visualization that is driving the competitive differentiation for the organization? How do you see it playing into that goal? And do you see that without having operational dashboards? They're also feeding it back to you know, the transactional system. How many organizations are doing it, how successful they are? What are the challenges associated with it?

James Serra: [00:31:17] Yeah, that's an interesting topic, and I feel like we're in the early stages of a machine learning kind of blossom, because in order to have and if you think a machine learning, you have to train the model. To train them you need a lot of data. I mean, a lot of data. Very recently, most companies don't have that, and so we're in stage two where we're collecting the data and that's what most companies are at, and once they collect all the data, then they can go back. Let's get some data scientists. Let's think of ways that we can build these models and more predictive analytics on the data now that we've got it all together. And so while there's a small part of companies who collect all the data, that means down the line, I think a big wave of people using a lot more machine learning because finally, before we let these results and they can build those models on their own, and that's where I see a huge amount of that. So I always talk to customers. I say, let's here's the icing on the cake. It's Power BI dashboards and reporting when we show you what those look like and then machine learning. Let me let's bring some things you can do to increase your bottom line on there. Sometimes they thought of these things and then other times they have not. And at Microsoft we would go around and show demos's of all these products using related to their industry.

James Serra: [00:32:45] And they would go, like, I never really thought of that. Or we would tell them what their competitors are doing with machine learning to that they do the same thing. So if you show them all the possibilities, you get more brainstorming sessions going where they can think of all the things they can do with that, the predictive analytics on there. And then you have to go, well, that's the good news. You can do all that stuff about how we do that. And if you're not a point yet, we've done a lot of that and still be a lot of work. But it's going to be a huge payoff on some of the machine learning models are just unbelievably accurate and how they can help improve the bottom line and save money be more cost effective. And so I see a lot of interest and and data science and finding the talent for that. And while the products have automated machine learning and they can make you do some cool things in their time in the government, you don't really know what you're doing to learning to create a model that can totally accurate. So you want to make sure you get the right people in there that can help those models. And then it comes if I can create those models and and do things with the data.

James Serra: [00:34:00] And then to your question, I feed it back to the transaction system so I can take that model. And if you think of somebody like Amazon, when you go to buy something, there's a machine learning model behind the scenes going, well, you know, the stuff we want to like. And I always seem to want extra stuff, but I we seem to know what I want. And that's the machine learning model that's been trained based on what you've previously bought. And think of that as they move that key learning model into their operational. And can you do that as a customer in your company? Can you be interacting with the customer and it pops up things like, well, maybe this person will like this? Well, maybe you should ask them about this type of loan if you're a bank. And maybe we're predicting that they're going to leave the customer. And there's an eight percent chance based on the historical trends that we're tracking of the transactions, you should offer them some discount coupon or something that's within their business. And so all we see, a lot of Munchkinland model is trying to put that into the operational so that the point of service in there where you can take actions right away and adjust behaviors of customers as an example to keep their business on there.

James Serra: [00:35:15] So that's that's I would say that's even more advanced is to move that machine learning model. And to all people talk about that, some challenges. That's the beauty of creating machine learning models. You just need a lot of data to train it. Once you train it, deploy it, and then pass on the variables that are familiar to these systems, and that instant results. So that's another big thing. That's a great thing. Demo the customers and they know the mind just. Well, I think we can do something like that and we definitely need to build this. And that's how you make them understand the value of that and they're willing to unlock the budgets. And so that's interesting because we used to always talk to just two people, but they would do something like this, just another thing on the plate, and they may not be so high on doing it, but you do that to the end-user and they go, this is going to make their life so much easier. This allows us to save a lot more money and we've got to love it. And so they have that budget and they get actually involved or hire more people. But it turned out that going to the end-user was a lot more effective for making and making them understand the value that I've seen a lot of cases

Khalil Sheikh: [00:36:23] Like thanks, James, you mentioned something interesting. You talked about Amazon's propensity to buy model, which is like, OK, what would you buy based on your demographic, based on your buys and things like that? What are the three to five ML models that you're seeing in the industry that could be showcased, for example, churn, Sentiment Analytica, while we are talking about capacity, we can also talk about the propensity to pay. So based on that person's ability to buy eight hundred thousand dollars versus five dollars. So what do you see that three to five models that you see as a showcase for the industry that they're trying to strive towards, but it's still somewhere in between.

James Serra: [00:37:08] Yeah, there's so many miles and a lot of depends on the industry, and if you're a retail industry, if you're a health industry and each one of those that doesn't rely on you, very effective. And some ways it's kind of scary about what data is collected that you're not aware of. If you look at the automotive industry, they can have sensors on all your car and they could go and predict that, hey, this and this is this is great. This is helpful for a driver to be able to have a message sent to them and say, we predict you're going to need not just an oil change, but your car tire and replaced or we're getting sensors from your engine. You need to go and have something like this checked out before it breaks. And imagine pulling out to an airplane to see a lot of human models that predict the end of life or breakage of parts. And so you fix them before the break. Imagine that being an elevator and having to fix something that's broken on there. So that becomes very popular. This is the time, that lifetime of individual parts. And so we're seeing more of the automotive industry try to build that in. Now, they also take that sensor data and that could be used to help your auto insurance rates on there, that more of a cautious driver and there it could be used for, which was interesting in one case where they took the data and they got to the point where they a unique service.

James Serra: [00:38:43] Would you like us to schedule it for the nearest dealer? Because we know your location can be the appointment for you and you can just go and get an email or text and say yes and just everything. And then they collect that data of you driving and then you go into the automobile dealer. And previously the Salesman says, hey, we've analyzed over this data from this person that they like to drive fast. And so they may be interested in upgrading to a sports car. So when they walk in and talk to them about sports, so it's way up selling on that. And that happens a lot in the retail industry. They even have background backdrops that will use cognitive services, which are the ability to sense are you male or female, what your age is, and are you happy or sad and the just the background color schemes and retail, because they know males are more attracted to certain colors and then they highlight that they use those colors for the more expensive alcohol, for example, in the background, the less expensive one, the change, the color. That is a fact. So those things sometimes you feel like there are ways of using machine learning models to tell you. And so a lot of that is is trying to sell you products in there. But then some of the real value machine learning models. You mentioned sentiment analytics analysis in there. And I can even I can pull in Twitter feeds and I can look across the country and find out spots where maybe a lot of people are catching the flu.

James Serra: [00:40:17] And you can judge your Twitter feeds, whether this is sentiment, happy or sad or neutral. And based on what you're looking for, what do they think of your company? And they are. And you can and you can see maybe there's people also have seen a lot of bad things about you take actions on there. It could be that I'm in a hospital and or I should say a supply of medication. And I want to find out where I should dispersals medication. And I see there's a lot of people talking about the flu in the Northeast. So I'm going to move more of that to there. Maybe I can predict analytics on when we're in the hurricane and I can move, say, a Home Depot and I can move all these five different areas where we think it's going to be. Hurricane or there is getting a lot of snow. Let's move the snow shovel. But not to the point again, we don't want no snow shovel. Listen, that happened when I lived in the Northeast and I got one of these predictive models to know that there's snow going to come and have enough snow shovels there. And so I'm coming around with so many examples of it. And you can go online and you can look at the actual site and of my industry over in mind, give you some ideas.

James Serra: [00:41:35] And but they're becoming very popular as we've got so much data that you can be collected. It's really unlimited what you can do to help sell or to save costs or to not have a completion. I'll give you one more example. This was a. A company that sells cosmetics when the baby wasn't there and I didn't realize how much cosmetics they have to throw away. It was tens of millions of dollars every year because it has a shelf life. And so they were using machine learning to predict what areas would use this particular makeup, maybe with lipstick. And they know certain lipsticks are more popular in certain areas of the country. Let's get ahead of the curve because a similar fashion goes on there. It certainly looks like no one on that supply, that part of the country. So they're using to keep learning to move the distribution around. It went all the way back to manufacturing plants to say you can increase or decrease the supply. And that means if we had the supply around the country, let's move the supply before the expiration date so we can sell more of this and not lose millions of dollars, even if it was just a five percent increase in and a decrease in material that the throw away that some tens of millions of dollars in there. So that was one example where they were using machine learning to the extreme to save tons of money.

Khalil Sheikh: [00:43:03] Thank you, James. Where do you see what other industries, which are technology laggard when it comes to the overall byplay versus the companies of verticals that are relatively advanced, like from an overall positioning perspective? Where do you see the hunger rates to have this kind of AI/ML play to advance their competitive differentiation?

James Serra: [00:43:32] Now, my industry and this is where a lot of companies like Microsoft focus more on industry and a lot of companies have changed their model to be more industry-focused on there, because you can then put people in place that know a lot about the industry so they can talk about you, about the industry and about technologies on there so they can understand the latest trends. And because of that change in the last few years, I've seen a lot of the industry, almost really all of them go to great lengths to start incorporating these technologies, these data warehouses, machine learning, and additional reporting on there. I think some of the laggards and I was in New York City and a lot of finance and they were somewhat of the laggards, finance, banking, A lot of that was to do with they were worried about the cloud and security. And there had been a lot of education to make them realize that the cloud is more secure than anything on Prem. And the scale of these data centers that Microsoft has is much bigger than that. And so it took him a while, that trust in the cloud. And then Microsoft has gone to the point where they have government cloud and they even have these secret clouds that are military and such. And so the level of security on there is just tremendous down there so that the tools and technology are all there.

James Serra: [00:45:02] And so finance and banking have sort of line on the cloud, but they were behind the curve because it took them a while in there. And so I feel like that industry is still trying to catch up. And those ones, when maybe small industries like the oil and gas are still behind and a lot of that with the cycle of oil prices and such and the ones that I think are leading are retail and health care, because they can see the value right away of how much money they can save or when it gets to health care, they can save a lot of lives on that kind of machine learning. That's when somebody is it's likely to be readmitted, which is the biggest cost. They leave and come back and prevent that so many miles and drugs are keeping the longer or look at other patients, not similar things, and what they've done to prevent them from coming back. So they see Nosmo kind of like when you see a huge way to save costs and also a way to save lives. The health care would more than more meeting and then and then the retail because even saving a few pennies here and there because of the volume of millions of dollars. And so, I mean, that's my take on it based on my own personal experiences and working with a lot of different industries.

Khalil Sheikh: [00:46:23] You mentioned health care, so I've been working with a company about fraud, waste, and abuse within the health care billing, Medicare, medical, meaning that this happens there. Do you see companies or health care organizations adopting these kinds of models? And what is the level of maturity there? Because one use case I have seen where, you know, with Medicare, just one billing organization, medical network it. Ninety million dollars worth of abuse, fraud, and waste has emerged within one or two weeks of modeling. Right. It could be much bigger than that. So do you see that use case happening in health care based on your experience and what is the maturity level of there?.

James Serra: [00:47:14] Yeah, that's a good point, not having flashbacks of one of the first machine learning while I did was just that, and that was more than eight years ago, was to find fraud. And there was a lot of it being were built for will that never happened, whether it's the patient doing it or not doing it. At the same point, I remember them saving tens of millions of dollars by being able to find planes that shouldn't be flying because the human model would tell them, look, this is very similar to something else. And then there's all this machinery, miles of record. You can have accuracy like and think is 90 percent chanteys. And then maybe you could be claiming that maybe we automatically reject something that, say, eight percent or higher. And then if it's in, say, fifty-eight percent range, we're going to have something better to look at it. And then somehow we're going to be able to know whatever those numbers are. It's great to be saving on the time and expense of people manually looking at everything will just kind of spot check, things like that.

James Serra: [00:48:14] So the machine learning models in that capacity save a ton of that. So, yeah, that's that's a big thing it's fraud. And it same with the banking industry now and there are 30 products and you will have some of them that detect fraud, looking up banking transactions that they can detect, which is money laundering going on by analyzing that data and having a cumulative model on that. So that's one thing I can say with very little effort, really just having the data, tens of billions of dollars. But the health care industry with this market for five, because it seemed to be a little bit simpler in the models based on the data, it was almost like a no brainer that the claims and that and that's pretty easy to do. Too many models that this what drive on is finding duplication of things like that. And you just get the right data scientist in there and build up my own company a little bit. And then the level of savings and reduction of effort and fraud is tremendous.

Khalil Sheikh: [00:49:16] I no thank you. How do you see RPAe playing a role in larger business intelligence as I mean, there are so many robotic processing is happening with respect to billing customer care and others, do you see them playing a role in modern BI.

James Serra: [00:49:37] What is it again?

Khalil Sheikh: [00:49:40] RPA robotic processing for billing claims, customer service and all, do you see them playing a role in Modern BI by bringing the data? Because, like, you are trying to cut the cost off with something like by implementing this robotic processing, how do you see playing a role in the Modern BI.

James Serra: [00:50:03] Yeah, there's it's interesting, too, because the tools have gotten better, they allowed some users to do some of the things that were typically done with IT on some of the users in there. And the whole idea when I talked to customers of buildings is to make it so you can get self-service BI so I can do the upfront work and have the data presented to them so they can click and direct deals over there. So this is where you get into if I just dump it on a daily and I tell somebody the Use Hive and Sparks SQL, you make end user who just doesn't understand any of the words coming in from your mouth on there. So somebody from my IT needs to go and do the upfront work and put it in a relational database with the metadata, create a STAR schema and do so. They're doing all the joining so that all that and clean all the data, mash with all. And then you go from here to our data set, click on that. There's all the fields we'll have at it and then go, oh my goodness. So we did my homework and they can even maybe trick you into monitoring that, but you've made it so easy. So that's why I would say when I look at this modern data warehouse or data fabric and the data has been copied and they go, well, it's too expensive, I'm going to come copies.

James Serra: [00:51:23] As the data moves along here, you're adding more value to the data. You're making it easier for the end-user. So, yeah, I copied it from data lake to relational database, and then I may be created star schema with multiple copies. But the end result is, oh, so easy for that user who may be a mechanic on a service line, doesn't know anything about technology, but you presented it to them with very little training. They can start creating those dashboard works on there. So it's that balance of it. And that's why I think they failed because they didn't understand that people using this have no understanding of the technology and never make it too hard for them because they are just a glorified platform and just walk of there and people like to make sense of this. So do that extra work on there. And so this is why we're seeing a lot of responsibilities now of generating reports and dashboard are the end-users are not waiting for you to know that because I tell you the work of creating an environment where the end-users and all these different various organizations within the company can build the data, build the reports and equip them.

Khalil Sheikh: [00:52:33] Thank you, James. I want to wrap up this conversation, so thank you, James, for the great insight again for a larger audience here. If you're seeking support in terms of did any data management and analytics project in the cloud and globally as the company to call upon, we love to support you. And thank you again for joining.

James Serra: [00:53:00] And thank you for having me and about this technology, hopefully, I helped clarify some things.

Khalil Sheikh: [00:53:07] Yeah, absolutely you did. Thank you.

Scroll to top