Podcast

min read

James Dice

🎧 #169: Case Study: T-Mobile Scales Digital Metering Across Data Center Portfolio

November 26, 2024

"We have billions of dollars in investment in this network, how do we make sure we’re making the right investments in the right places? Our instrumentation, telemetry, automation, all of these tools we use are critical to meeting that fundamental mission.”

—John Coster

Welcome to Nexus, a newsletter and podcast for smart people applying smart building technology—hosted by James Dice. If you’re new to Nexus, you might want to start here.

The Nexus podcast (Apple | Spotify | YouTube | Other apps) is our chance to explore and learn with the brightest in our industry—together. The project is directly funded by listeners like you who have joined the Nexus Pro membership community.

You can join Nexus Pro to get a weekly-ish deep dive, access to the Nexus Vendor Landscape, and invites to exclusive events with a community of smart buildings nerds.

Sign-up (or refer a friend!) to the Nexus Newsletter.

Learn more about the Smart Building Strategist Course and the Nexus Courses Platform.

Check out the Nexus Labs Marketplace.

Learn more about Nexus Partnership Opportunities.

Episode 169 is a conversation with John Coster from T-Mobile and Mark Chung from Verdigris.

Summary

Episode 169 features John Coster from T-Mobile and Mark Chung from Verdigris and is our 13th episode in the Case Study series looking at real-life, large-scale deployments of smart building technologies. These are not marketing fluff stories, these are lessons from leaders that others can put into use in their smart buildings programs. This conversation explores T-Mobile, which has been deploying metering and analytics technology to help manage its data centers. Enjoy!

Mentions and Links

T-Mobile (0:00)
Verdigris (5:58)
An Inconvenient Truth (6:25)
NEON (9:21)

Highlights

Monologue from John (0:00)

Introduction (2:20)

Intro to John (2:53)

Long experience in data centers (3:48)

Intro to Mark (6:00)

Project scale (7:20)

Tech stack (21:20)

The buying process (30:43)

Meter data results (35:12)

Lessons learned (38:20)

Conclusions (48:21)

‍

Music credits: There Is A Reality by Common Tiger—licensed under an Music Vine Limited Pro Standard License ID: S644961-16073.

Full transcript

Note: transcript was created using an imperfect machine learning tool and lightly edited by a human (so you can get the gist). Please forgive errors!

‍

John Coster: [00:00:00] If you think about What is T Mobile's business? What is the core business? Some people think it's telephones, some people think it's, you know, IOT in the future. T Mobile's business is that we, we monetize Spectrum. We get licenses, we build a network that can carry traffic, and that traffic is what produces the revenue that feeds the company growth, right?

All kinds of, all kinds of traffic. And that network is, Many, many thousands of cell towers that are out there, which is what people think of because that's what they can see. What they don't see is that there are dozens and dozens of data centers where all those towers connect and all that traffic is, is traveling on.

That is the core. In fact, they call it the packet core. It's the core network upon which everything rides. So having that available when we need it. 100 percent uptime is crucial to our business. That is the foundation of our, of our business. Without that, there is no business. So having what we need, when we need, [00:01:00] uh, where we need it, um, as our network grows is, is that's fundamental to the business.

So the question is, we have billions of dollars in investment in this network. How do we make sure we're making the right investments in the right places? Right. And that those investments are performing to meet the mission of the business. Well, that's what our instrumentation, that's what our telemetry, that's what our automation, all of these tools that we use, they are critical to meeting that fundamental mission.

James Dice: Hey friends, if you like the Nexus podcast, the best way to continue the learning is to join our community. There are three ways to do that. First, you can join the Nexus Pro Membership. It's our global community of smart billing professionals. We have monthly events, paywall deep dive content, and a private chat room, and it's just 35 a month.

Second, you can upgrade from the ProMembership to our Courses offering. It's headlined by our flagship course, the Smart Building Strategist. [00:02:00] And we're building a catalog of courses taught by world leading experts on each topic under the smart buildings umbrella. Third and finally, our marketplace is how we connect leading vendors with buyers looking for their solutions.

The links are below in the show notes, and now let's go on to the podcast.

Welcome to the Nexus Podcast. I'm your host, James Dice. This is the latest episode in our series, diving into case studies of real life, large scale deployments of smart building technologies. Um, we're here to basically share lessons learned from leaders so that other people can put these things into use in their smart buildings programs.

And today we have a story from T Mobile, which has been deploying metering and analytics technology to help manage their data centers. Um, I have John Koster here, uh, senior manager, innovation, planning, and strategy. At T Mobile, John, welcome to the show. Can you introduce yourself, please?

John Coster: Yeah, thanks for having me.

Yeah, I have a great job here at T Mobile. I've been here for about seven years and my job is [00:03:00] to, um, figure out where we put, how much we put, uh, and when we put, um, our data centers to handle all the traffic that supports our network. And I also get to have a cool sandbox where we develop and integrate new emerging tools, try them out, uh, so we can optimize our capital investments.

James Dice: Awesome. Sounds like a perfect person to have on the show. Um, I, I was looking at your, your long, uh, career and, and, and really, it seems like you've been in data centers for a long time and I'm wondering, I remember when I did my first energy audit of a data center, this was probably in 2010. And I had imagined like from then to now, data centers have changed a whole lot.

But even the time from when you started in 2010, they'd probably changed a lot then. It seems like there's a lot of different iterations of what a data center means.

John Coster: Yeah.

James Dice: How you've sort of adapted yourself that whole time.

John Coster: Yeah. I'm, I'm sort of an accidental data center [00:04:00] expert. Um, yeah. I, I started, I designed my very first data center.

We didn't call them data centers back then. We called them computer rooms and, uh, because that's what they were. They were rooms that had computers. And the first one I designed was in, uh, 1988 and it was Microsoft's first data center. I had to copy an IBM data center cause nobody knew what they were. And that's what we did.

And we, we basically cut our teeth on that. And who knew? Because before then, you know, prior to the 2000s, before the dot com boom that blew everything up, um, data centers were handled by the government, they were handled by institutions and a few large corporations. They did not exist. As a thing. And so it's evolved.

And I just happened to be there at the right time when that took off and, uh, and began to grow. And so I was there during the dot com boom. I was there after the dot com bust. And what's interesting is how so much of the old data center ideas, or ideas of data centers, they were, they were just buildings that people built to throw computers in.[00:05:00]

And the computers were, they, they needed to stay up, but they weren't, they weren't terribly dense energy wise. And And they, um, they were important to keep up, but nobody spent a lot of money, except for the banks, didn't spend a lot of money on making sure that uptime was key, that reliability was key. And now that's all changed.

And they continue to evolve now, especially now that you just don't have applications sitting on a server, sitting in a room. Now you have clusters of applications that sit on clusters of servers that sit in the cloud, and they can move that work, that workload around between Sites, actually. So all of a sudden, the idea of keeping a building up 100 percent of the time is not always exactly the same mission for every, every application.

So things, things are, they're taking a huge change right now, especially with, with AI. So, yeah, it's a, it's a never ending learning experience.

James Dice: Awesome. Well, I'm super excited to have you here. Thanks for coming on. We also have Mark Chung here from Vertigris. Mark, can you introduce yourself, [00:06:00] please?

Mark Chung: Uh,

thanks

James.

Um, I'm Mark. I'm the CEO and co founder of Vertigris. Uh, I am a electrical engineer by training and started my career in microprocessor design, eventually working in network design, and found my way to packet inspection, architectures, and algorithms. Several years ago, I, uh, watched Uh, by Al Gore on climate change and started a different journey, which was to take technology into the space of electricity and try to figure out how we could use technology to bend the curve of electricity, always around trying to find the right business application, the right value stack that we can deliver with that technology.

Deep, insightful, uh, data set. And I'm, I'm thrilled to be here and working on this, these projects with some of the leading companies in data center space, including my good colleague here, John.

James Dice: Awesome. Um, so we're going to talk about, do a little, um, sort of a deep dive [00:07:00] into what you guys have been working on together.

Um, can we just do sort of a quick rapid fire sort of overview of what you guys have been doing together? So can you talk about, um, just give us numbers like, uh, we're talking about meters, how many panels, circuits, um, buildings, whatever you use to sort of quantify how big this is just to give people an idea of the scale.

John Coster: Yeah, sure. So we have about, these days we have about a hundred data centers, and those data centers have tens of thousands of nodes and devices that are in them that we try to monitor. And a lot of the, a lot of the IT workload that we have is DC. But for every, I don't know if people know this, for every watt that goes into a data center or goes into a computer, it is 100 percent inefficient.

It gets rejected exactly that same amount of wattage and heat. So, it's a very big deal. How do we reject all that heat? And how do we do an energy mass balance calculation that says, here's how much energy is being used for our workloads, and here's how much energy is being used for the corresponding [00:08:00]mechanical equipment?

And how do we know? What those, what those ratios are. So we have installed, uh, Vertigris on all of our AC alternating current and air conditioning. This means both, I guess, in this case, um, for all of our sites. And we've got, according to the last count, I guess, over 40, 000 sensors out there that feed into the cloud that let us know just what's happening out there and, um, I can give you some.

examples of what we've discovered along the way, but it has been a journey of discovery and I think is probably most people who deal with big data have learned. You don't know what you don't know until you start collecting data and then you go, oh look what we can know. We didn't know that. I wonder if we could know this too.

So it's always this sort of building blocks of Getting new insights because you've, you've mined data that you couldn't have justified necessarily, but it's, it's a pretty cool journey because you, sometimes it's three levels in, you didn't realize that's actually what you needed. So it's a, it [00:09:00] is a little bit of a throw it out and see what happens and go, Oh, and then see where the, see what the value is.

James Dice: Got it. Okay. So you mentioned meters and sensors. What type of sensors is it just monitoring the different temperatures across the server racks or what? How's that work?

John Coster: Well, we have a, we have a constellation of applications that we've woven into a product we call NEON, and it's a network engineering ontology tool.

And one of the nodes, one of the, the, the data feeds that comes into NEON is, is Verticris. And what we have is, we actually have over 100, 000 devices plus untold other systems that pull in computational fluid dynamics models. We have. Building management systems. We have a lot of different things that feed into this platform.

But Vertigris has a very special place because the other systems are more simple telemetry. This is what we have. It's time series historized data that comes in and that's what it is. What Vertigris provides for us, which we kind of knew but didn't really realize how valuable it would be, is the ability to analyze the data that they [00:10:00] provide us on the AC side and actually tie data from other systems that correlate The behavior and the performance of those other systems.

So it ends up being sort of a brain that we didn't intend to use for other things, but we discovered that it was very useful, um, for that as well. So we have this partnership where we have an AI tool that we ingest data. They have their tool that ingest data. And we're finding that, um, not only do we have.

Current transformers on every circuit of every piece of air conditioning equipment and mechanical equipment that we have, but we can also take other data sets from other systems and tie that in and to find, find relational insights into how those systems are working.

James Dice: So how long have you guys been working together?

I know, John, you've been at T Mobile for seven years. Just give us, give us a sense for how long this has been going on. I

John Coster: think at least five.

Mark Chung: I think so. I think probably that first, very first deployment was in a lab at T Mobile. And it was Oh, that

John Coster: was

Mark Chung: seven

John Coster: years ago. [00:11:00]

Mark Chung: That might have been, actually, yeah, that might have been seven years ago.

And we did some work there, and then, um, probably about two years after that, I think that's when you said, hey, we have an issue, we need to collect a lot of information for this 5G rollout thing.

John Coster: Right, right. So, James, one of the nerdy things that we did to try it out was, in the lab, we wanted to know How efficient our computers were running the CPUs.

And so what we did is we took 2 racks full of equipment and we, we clamped on every single piece of equipment. And then we ran synthetic load across the CPUs to watch what the energy efficiency was. And so we, with the time series data, we could capture that. And we discovered some really important things like how much of our compute load was just spinning, doing nothing, but chewing up energy.

You know, and we spend hundreds of millions of dollars in energy. And so learning how to shave even a little bit off, even how to be a little bit more efficient, getting some exposure to that was going to [00:12:00] be super critical. So we saw the insights that we could gain from that level of telemetry. It's like putting a Dranitz meter or a Fluke meter on there, except it was all tied to the cloud and we could pull it back and take a look at it.

So that's, that's where we said, wow, this is really powerful. What if we deploy this across our entire portfolio? And that's what we ended up doing.

James Dice: That's awesome. And so, can you talk about a little bit more, and we're kind of starting with the story of the, the project now. That was the very beginning, you know, this, this first question in a lab.

Can you kind of take us through, um, Maybe a little bit more around the, like, why were you trying to understand the efficiency of the CPUs? But really talk about like, what sort of started this, this journey a little bit more.

John Coster: Well, I think it's, um, it's waste. We try to eliminate waste, right? Um, trying to, trying to save money, right?

We have these very expensive networks that consume enormous amounts of energy. And I think every, every company. Part of their mission is to figure out [00:13:00] how to improve their bottom line. And if energy is a big component of that, and especially if you have, like, you know, even though we monetize spectrum, we're very much of a consumer based company, you know, we've got 100 plus million customers out there and they care about the carbon footprint of our business.

They care about how green we are and how sustainable we are. And so being able to take a look and benchmark that and then say, look, you know, if we take these. Efforts. We can lower costs. We can lower our carbon footprint. We can improve our profile. Uh, and be better stewards, you know, be seen as, as good, you know, corporate citizens and global citizens if we do that.

So you can't make, you can't make changes unless you've got good data. Right? And so we didn't have any good baseline data. So we started with benchmarking and then we can take a look and say, all right, where are the areas where we have waste? Where's this stranded assets are sitting there, not doing anything, wasting money, wasting energy, wasting carbon, you know, emissions.

And how can we, how can we pull that in? So it's, you know, it's, it's just good management.

James Dice: Totally, and [00:14:00] before this, you might have had, and I'm guessing, I want you to fill me in on where I'm wrong, you might have had monthly utility bills and maybe a sort of an interval meter at the building level, but what you're talking about is getting into, I mean, you said the word CPU, so you're getting into a lot of granularity around how power's being used after it goes through the building meter.

Seems like.

John Coster: Yeah, yeah, yeah. You know, when, uh, one thing about the, the, the wireless, the telephone business, it is, um, I've worked in some pretty complex industries, uh, before I came to T Mobile. I worked in aviation, I worked in port facilities, logistics, healthcare. I've never seen such a complex machine as this cellular network.

The number of pieces are just unfathomable. I don't think anybody knows, actually knows how it works, um, because it's so complex. And it's got so many different versions in it. I mean, there's 3G, 4G, 5G. www. T Mobile. com And you've got different platforms with different levels of that on there. It's [00:15:00] just, it's, it's enormous.

So to be able to grab a hold of any of that and start to measure it was, was the mission, right? You, you just can't, you can't manage this unless you can get data and then see what's happening. And so that was the mission, at least at the facility level, at least at the input of the network device level.

What are we actually using? And, And, and what does it look like? So that's, that was really the mission, just to get some baseline data and then to see what we could know once we got it.

James Dice: And how has that, when you started there just to get some baseline data, how has that evolved into, um, maybe tell the story of like, we started there.

to this thing in a lab. How has that expanded into more use cases for the data? Um, you started to talk a little bit about data going into NEON and then using Vertigris software platform. Can you start to talk about, okay, you also use the words data driven decisions, right? So how does this, how is this getting into base, from baselining into [00:16:00] these are the things we're doing with, with the data once it gets into the software layer?

John Coster: Yeah, before I get to there, I want to maybe talk about some of the challenges, which you can edit this around if it doesn't make sense, but totally. Um, I think one of the things to understand is that there's a, there's a, there are a million different companies out there that sell telemetry. Really good stuff.

All the major manufacturers that you could name off the top of your head make good systems that grab data and will give it to you in very useful intervals that you can use. The biggest challenge, especially when you get to large amounts of data points, like I said, hundreds of thousands or tens of thousands and then millions of data points, is data trustworthiness.

That is probably the biggest challenge. Um, Do you really trust it? And it's easy to say, well, I can go through tens of thousands of devices out there. How do I know they're real? How do I know they're accurate? And that is, that is the tough one. Because if you don't have good lineage, if you don't have good internal auditing, if you don't [00:17:00] have good analytics that allow you to see Where the gaps are, you will just make bad decisions because you're, you, you have this high fidelity data, but then it's not accurate, but you don't know it's not accurate and how would you know?

And so the first thing we had to do is it wasn't the deployment. It wasn't the use cases. It was getting the data trustworthy. And so we built some tools that allowed us to see. We, turns out our data was like 60 percent accurate. Well, you can't make good decisions when it's 60%. And I told my team I want it to be 95%.

Actually, I was happy with 90, but 95%. And, and then, then I'll jump into, you know, one of the use cases. That, I think, is illustrative of the kinds of things you find. So, like I said, every watt that goes into a data center gets rejected in heat. And the amount of energy it takes to offset that heat is, is called a power utilization effectiveness.

So, what's the overhead? So, for example, for every watt that goes in, if you had to use another watt to [00:18:00] cool it, that would be a PUE of 2. If it took a half a watt to cool, it'd be 1. 5. The goal is to get that PUE, to get that overhead shrunk down. Of course, with Vertigris, we have that on all of the AC sides, so we can actually see the PUE.

Okay, we actually see what's going on by site, by room, by sector. We can actually see what's the overhead that it takes to cool this. If I'm spending hundreds of millions of dollars on energy, right. And I can shave off 5 percent or 10 percent on my PUE. I mean, that's huge. That's right. Bottom line EBIT savings, right.

It's EBITDA savings for us. So that's a really big deal. So, so what we found was that. We have all these devices out there, and, well We were assuming, for example, that our PUE was a certain number and that PUE would then tell us here's how much available capacity you have to add more computers into this.

Okay, that's a big deal. So I'm watching my [00:19:00] capacity. I need to know when to pull the trigger to spend more capital because I'm running out of capacity because my network's growing. And I'm using the number from a mechanical telemetry to give me the PUE to determine what my, uh, What I need to build. How much IT headroom I have and when do I need to pull the trigger to, to add more.

And I think that Anybody who's in the industry will tell you that somewhere between 10 to 20 million dollars a megawatt is what it costs to, to build out, and especially in an existing site, you need to expand. So if we just take, for example, worst case scenario, 20 million a megawatt, if I need to add a megawatt of capacity, I gotta spend 20 million bucks, just like that, right?

And it's not small, it's a big chunk of change. Well, one of the first things we found was that Because we were able to determine that there was a data error, it turns out that one of our meters that we were using Was not working. In fact, it hadn't [00:20:00] been working for a long time. And our algorithm uncovered that meter and found out that that one meter misreporting represented $11 million in stranded capital we didn't know we had.

So we brought that meter up online, we could defer spending another $11 million. And that's like real money, right? Because I was gonna have to go in and ask for that and we were gonna have to spend that to build that and we didn't need to spend it. So you talk about not only financial savings. But the embedded carbon that goes with that, I don't have to go and spend market resources that, you know, generate embedded carbon, right?

I mean, there's, there's so many elements to this, if I can just get my data right. So having trustworthy data is the very first thing, and being able to identify when there's errors in that data. Pinpoint those errors quickly, get them corrected. That's kind of, that's the holy grail right there. Okay, for us anyway.

Yeah,

James Dice: absolutely. Yeah, that's a great, such a great use case. [00:21:00]Can you guys talk about, Mark, maybe you start and talk about what you guys can do on the hardware and software side to make the data more accurate. And then it sounds like, John, maybe you could add on what you guys are doing in NEON separately from that, but I'd love to talk about like the full tech stack and how you're making the, that, uh, data as accurate as possible.

Mark Chung: Yeah, I think there's, um, it's not an easy problem to solve to get data accuracy, um, and definitely a lot of learning as we were doing this project with T Mobile, um, I think there's foundationally maybe like three or four things that are really important to getting the data accurate. One of them was this cloud connected architecture.

So we needed to have a persistent hypervisor that was overseeing all of the sensors as it was coming in. So really just having a connected architecture, not relying on a standalone system standing by itself, working perfectly in isolation, having something that is [00:22:00] perfectly working in the cloud that's connected to this to see and monitor those things consistently.

That's kind of one architectural component of what enables us to get accurate is like, we can see if systems are down, we can see all these things are there. Another component that I think was essential to the accuracy was installation error correction. So a lot of the problems that happen when data comes in Is people slap in CTs, they collect, connect it into maybe some kind of two wire thing.

And then some system configuration happens later. It goes into BACnet. These, all of these are like multi point errors of, uh, problems for errors and, and telemetry issues. So, um, Vertigris made a lot of innovations in this space. So we've, we started with making the CTs small, digital at bust so that you can't You know, code them incorrectly or insert them incorrectly.

They can only, they're keyed so they're polarized in one direction. You can only plug it in one direction. It's also [00:23:00] digital. So it's on a digital bus. And that enables us to streamline the installation, make it very, very easy to install, and a little bit less error prone to making the connections, um, incorrect.

Um, but on top of that, we did a lot to innovate the CT itself. So the CT is actually technically not a CT. It's a Hall effect sensor with an air gap that has no saturation, but allows us to sample at a very high frequency. And what that high frequency enables us to do, I hope this is not getting too technically detailed, but what it enables us to do is then also map that frequency to the phase of voltage.

So oftentimes you could have incorrect polarity, you could have incorrect phase. There's many ways in which the CT, the current and the voltage are not matching correctly to give you the right, uh, power factor, uh, variance, like active reactive power. So we. We have a way to detect these by sampling at a super high frequency and any errors that happen in the installation or the [00:24:00] commissioning then get, through software, corrected in post.

So we're both collecting the raw data, but then also we can correct all of it with a configuration file that sits on top of that data. So that allows us to correct for miscellaneous errors that happen in field and, um, and make sure that everything can be. accurate. As soon as someone buttons up the system, it's, it's accurate.

And I would say the last big major chunk of technology we put on here was making a very, very easy to use mobile app that pairs with the device now that runs through a sequence of checks such that when an installation is performed, we know with a hundred percent certainty that we can correct everything else that needs to happen after that, that person walks from that, walks away from that installation.

So that's, those are the. A lot of years of technology building to get to that level, but that's what we, that's what we do now to enable that, that high quality data.

James Dice: And, and John, does that encompass everything you were talking about? Or are you guys [00:25:00] doing more stuff on your end to make sure that the data is accurate as well?

John Coster: Yeah. I mean, we do, we're building a lot of that into our platform. You know, when you're in our world as an owner, you always have that build versus buy mentality. Why build something if somebody else is already doing it, right? So, We have a lot of different systems of different ages that are out there that we have to talk to, you know, through all the standard hooks, hooks that you can imagine.

But we really lean heavily on Vertigris because they do have such strong analytics. I mean, one example is in the area of sustainability, probably the biggest gap, and this is when I was in my previous jobs, is that nobody really knows what their as built drawings are in a building, any kind of building.

Right? Because they're made by different people over many years, and so there's just a, there's no good point of truth, and, um, we went and actually hired an engineering firm on one of our data centers to go through and tone out every [00:26:00] single circuit and make absolute as builts, and it took like six months, and it was crazy cost, and, and that's really important, but that's just not feasible.

Right? So we said to ourselves, I wonder, scratch, scratch, if we could use the analytics within the Vertigris platform to give us a digital twin, simply by the way the energy is being used. And it turns out, yes, and it turns out that it's about 98 percent accurate. So by putting Vertigris on all of the different circuits, we can just go, go tell me who the parent child relationship is.

Give me the hierarchy of all those. And it did. Without us doing any field work. Do you know what that does? See, I mean, think about what the value of a 98 percent accurate digital twin is, and it's dynamic. If it changes, I'll know it. I don't have to send somebody else to audit it. All of a sudden my ability.

On any kind of a building, no matter how old it is, to be able to manage those circuits, understand, do fault [00:27:00] detection, diagnostics, do energy tracking and utilization. I mean, the, the, the opportunities for a, for a clever engineer, even not such, such a clever engineer are just endless. They're just enormous.

And so we're pretty thrilled with that. And so we are now starting to build those hierarchies for everything. The cool thing is that even if it's not their system, like they're pulling data points off of other DC systems and other things, and they're adding those. So now I can see all my nodes. I can see the whole thing.

And it's like, all right, now I have accurate as built. Now I can manage to optimization. So that's, you know, it's one of those cool things you learn along the way. Like, Hey, I wonder if we could do this. Wow. Look at that. You know,

James Dice: there

John Coster: it can.

James Dice: Yeah. A long time listeners will have heard me talk about this before, but I'll, I'll tell it anyway.

I worked on this very large sky rise in Manhattan a few years back and there were 220 ish meters brand, completely brand new building out of the ground and no one [00:28:00] knew. What these meters were me at measuring.

That's not even surprising. You talked about the

James Dice: As Built problem. I think the As Built problem applies whether it's brand new or not, right?

So, I mean, I spent, you know, weeks and weeks and weeks as an external consultant coming in here and trying to, route through the as builts and try to make sense of where the data was going. And it's so refreshing to hear the ability for technology to start to do that based off of what the meters are telling us.

Um, Mark, I'd love for you to talk a little bit more about how you guys do that, um, that piece. Sure,

Mark Chung: sure. Um, well, it's, uh, A couple of things. I think the foundational component of it is a, is the high frequency sampling rate and the cloud piece. So there's two things that are coming into our, our cloud. One is we get very, very high frequency, clear data sets that are timestamped, time domained, and at a very, very high degree of, of fluctuation.

So every single power fluctuation that we capture [00:29:00] is being sampled around eight kilohertz. Um, that gives us a. Kind of historical record of data that's very detailed. When we take things like a DC plant load that might be coming in and then being aggregated, these minor perturbations that might happen that we can only see, like, intermittently, as long as the timestamps are sequenced correctly, we can realign them through a machine learning algorithm that comes in and tries to detect the relationship between those different events happening.

When we detect that those events are happening, we create these correlations. Once the correlation happens, then we say, okay, what's the probability that this is happening? Because an underlying event here is happening that triggers this thing here. And then it just tries to sequence them, um, to create the most probable outcome.

So, so when John said it's kind of 98%, it's, it's like a 98 percent probability that the unique sequence has occurred over a long enough period of time that we know definitively that these things behind this ACDC rectifier are related to [00:30:00] this upstream system. And then, um, you know, from there, it's about, uh, Schematically relating or through a schema, relating the nodes and then displaying it, which is, you know, kind of what we're doing.

James Dice: Let's talk about the, so John, before we hit record, you said something funny to me because of, you know, we had NexusCon, our first conference a couple of months ago, and one of the topics that really got people going, um, Was pilots and you, and you said something funny before the recording, we said, you said, we don't do pilots and I'd love for you to talk about just the, the buying process and finding Vertigris and then the decision to, um, pick Vertigris over other metering companies and then scale up across the portfolio with all the different nodes that you've, that you've, um, deployed.

Can you just talk about that, that process and pretend you're talking to another. Person like you that hasn't gone on this journey out there that's that's managing similar facilities. [00:31:00]

John Coster: Yeah, it's like Sometimes it is a leap off a cliff. In the, in the corporate world, for those who haven't lived in the financial side of the corporate world, that there are, there are rhythms, uh, to the business.

There are days when, or seasons, not days, seasons where capital is abundant and there are seasons when capital is scarce. And you have to sort of, if you're a sailor and you know what telltales are, you got to look at the telltales and decide when, when you can take advantage of the opportunities. And we had, we had a perfect storm.

It wasn't a trifecta, it was, uh, two things. Number one, we had, we were growing the 5G network, and, um, and we were spending billions of dollars to build new 5G antennas, and so the money was being spent. And at the same time, I was new here, and I saw that telemetry was non existent. Or barely existent. And what I said was I have an opportunity to scrape a little bit of that billion dollars, billions of dollars of putting out for the 5G and get telemetry in, [00:32:00] but I got to do it quick.

I got to do it very quickly because these seasons, they come, they go, and then when they're gone, there's long dry periods. So I needed to get the telemetry in. And so I had to quickly make a decision. Who am I going to bet the farm with? Am I going to go with some big company who Will sell me a Toyota Corolla, which is good, by the way, Toyota Corollas are great.

They do a good job. Or am I going to go with a smaller company who's really agile and can work with me? And as long as they can meet the demand, um, of getting product in, and we can get it deployed, Then I can work with them and figure out new cool ways to leverage that. I knew because my experience with them, I knew that I knew what they had in terms of innovation capability and what their cloud capabilities were and their eight kilohertz sampling rate and the sine wave, you know, I could pull THD stuff.

It wasn't, it wasn't like I was just buying another dumb old meter. So I knew that they had advanced stuff, but I could quickly make that decision to deploy them and so we did. And, you know, it wasn't [00:33:00] without some pain points, right? We had. Network connectivity issues. We use hardware Ethernet when we connect them, which even though they have Wi Fi and they have SIM, you know, cards, we wanted to go with that.

So just doing all the IP port assignments and everything to manage all this stuff, it, it's been, it's been a challenge, but that's what we do, you know, because of we're inside. If I was in a different kind of business, I would have taken advantage of the wireless side of it, and maybe we will down the road.

But so that was, that was kind of our pain points. But the decision to go with them was pretty straightforward. They're already proven themselves in the lab. The only question is, could they scale? And I didn't have the luxury given, given the boundaries that I had for available capital to go out and do a full blown, you know, TRL assessment.

I kind of knew and then we just had the freedom. Let's just go do it. And so we deployed it. It was, it was, it wasn't a complete Hail Mary. I mean, we knew that they were small and they were innovative, but I was willing to go with them because, um, I felt that the [00:34:00] long term value of having a more sophisticated platform, kind of the Ferrari versus the Toyota, I could, I could work with that.

And then they, they would scale with us. As the business cases got more complex, they were the ones who were positioned to help me. Meet those, those new challenges. Yeah. Rather than sounds like

James Dice: over time, over time, they've adapted the platform based on what you guys have needed as well.

John Coster: Right. Right. So it's been, as an early adopter, we've helped them also create, you know, some of the features that make their thing more valuable in the marketplace, which is good, you know, we want them to be healthy and we want them to grow.

James Dice: So absolutely. Yeah. Um, before we talk about those challenges that you just talked about, That you just mentioned. I want to, I want to dive into those in just a second. Um, we've talked about like business outcomes and results from the work that you guys have done together. And one of them was, um, the ability to avoid new capital expenses, right?

So the ability to say, based on this data, it [00:35:00] shows we have more cooling capacity. Um, let's not build this thing because we still have some capacity. What are the other sort of results that you as a business have gained from the meter data and the analytics?

John Coster: Well, you know, I talked about the digital twin side, which is good.

That's, that's pretty viable. I think, um, one of the things that, that we had, uh, we don't have very many outages, we do, everybody does, right. But when you do have the diagnostics and we had a particular one where, where it was pretty catastrophic and we lost the power system completely, and that means that the BMS system was down, the medium was down, everything was down.

And so it was, it was a rush. We didn't lose any customers on that one there, but it was very, very close. You know, we're doing the, the, uh, the reviews on that with all the executives. Everyone's pointing fingers about who was responsible for it. And the, the, the, the power of the utility company and the generator manufacturers were pointing fingers.

At us saying that our loads were, had corrupted sine [00:36:00] waves, that our THD was out of whack, that our power quality, that our chillers were a problem. And that's the reason why the utility was unstable and went down. And the generator guy said, that's the reason it went down. This is one of those things where when you've lost all your stuff and all your telemetry is down, what do you do?

How do you, how do you create the historical record? Well, it turns out, We were surprised, but now we're very happy to know that the Vertigris platform captured the last, the last minute of that. It was in, it was in memory. So we were able to recreate the sine wave and said, Oh, look at the quality of the data.

The sine wave was just absolutely perfect. What are you guys talking about? You can't point to our loads as having been the cause for your, your instability. And so that was a huge deal, more than you know. Now it took some, took some extra effort on Vertigris side to recreate those records. But the point was that it was in there.

Memory, it was in their cache, it was locally saved, and so we were able to capture just before the outage happened to say, or even actually during the event, because [00:37:00] things kind of start up and drop it off, we're able to look at it and go, no, our, our loads were, were completely fine. And so, when you're in a high pressure, highly critical system, and if you're trying to do diagnostics and say, why did something fail?

And everyone's pointing to, you know, conjecture to have data that you can look to and go, no, actually, here's the sine wave capture right here. Um, it was fine. That was, uh, even Mark to this day doesn't know how much that saved us, um, in terms of trying to defend our position, or explain, or diagnose, or spend lots of money to try and say, well, was it our systems?

Did they really fail? What do we need to do? You know, in terms of forensics? Um, it was, it was a clean deal. So anyway, that was, that's just one example of huge value of having capabilities that you don't always need until you need them, you

James Dice: This story is fascinating because it's just where we are as an industry in smart building technology right now.

There's a, there's a huge trend, which is everyone trying to sort of quantify the [00:38:00] business results, the ROI of new technology. And when I hear that story, I'm, I'm kind of like, feels a little bit priceless. It feels a little bit like, like how could you possibly begin to quantify that? You know what I'm saying?

Yeah. Fascinating. Let's talk about, um, lessons learned. So, um, I know you guys mentioned, well, there's, there's one that's sort of baked in here, which is as you guys have come up with more and more needs, Vertigris has built more and more features. Like you said, you guys are an early adopter. And so I think that's one challenge I heard.

Another is around, um, the networking challenge, right? You guys decided to go, you know, PoE, that kind of thing. Where do you guys want to take this? Like, again, think about talking to someone that's like just behind you here, and they're about to do this huge rollout, what would you tell them the biggest challenges are?

And how, how did, how did you guys resolve those or how would you tell them to resolve those?

John Coster: Yeah, I'll just say, I think that, um, you know, you don't like to say this in [00:39:00] public pocket. I think that we are our own worst enemy, you know, I think that, um, that when we deploy these We decided to not deal with any of the difficulties of spectrum inside a really complex building.

Like, is the Wi Fi really working? Or are we going to use internal private 5G? Whatever. We decided we're just going to go with, it wasn't PoE, but it was, it was Ethernet. So it's hard. And so we have. You know, we have port assignments and IP address and whatnot. I think the complexity of managing all of those IP addresses and port assignments, um, in a normal network deployment, to do it all over again, I would have worked a lot harder to make either the Wi Fi or the, or the 5G work.

Um, I would have put more repeaters, antennas, whatever it was to make it, make the wireless work. I think that's probably the biggest. I wouldn't say regret, that was the lessons learned, and I would encourage everyone to use wireless as much as possible, and, um, and a lot of it had to do with, um, [00:40:00] uh, new security protocols, you know, cyber security is like huge right now, right?

And so now there's all these hoops you have to jump through on port assignments and how it's treated, and, and I think that going with, um, fewer Fewer IP addresses, fewer, fewer port assignments is the way to go, especially if you have high, high volume. So that's, that would be my recommendation. Uh, that's a huge lessons learned.

Mark Chung: Um, I, maybe one of the lessons that we learned on our end was, you know, initially I think we had developed this like confidence that installation It's pretty hard to screw up and you can't really, like, there's how many ways could you possibly install this thing? Um, and, um, when we started to roll out across a larger set of T Mobile buildings, like the, that first phase was, can they have, Buildings everywhere all the way to like from Puerto Rico to, um, Hawaii and everywhere, right?

So they have tons and tons of these different, uh, [00:41:00] buildings, but that meant that lots of different people would be touching the systems and installing them. And in the beginning, we're sort of like, okay, well, let's just write a very simple set of instructions. And. Let's see how different could it be, but I think one of the things that we learned over time is it actually can be very, very different.

And, um, there was a lot of, uh, as John mentioned, trying to get to the data quality that we needed to get to. It was hard in the beginning to say, where is, why are we seeing this thing happen here and not happen in these areas? And some of that actually came to just the fact that it was just totally installed wrong or that, you know, Whoever installed it didn't install it correctly, and we had no way to confirm that in the beginning.

So there were several things that I think we, we, we fixed on the feature side, but ultimately one of the things that we did on a services side was we actually worked with, um, some folks that John put us in touch with. To consolidate our installations through a single professional service team that we then built a certified [00:42:00] installation program around, enabled them to go through it.

And so now actually, now we generally recommend that, um, when people do something that's as large scale as that, They go through a, like, a more rigorous certification process of installation and installation consistency and answering all the little questions about, you know, what happens if you have a conduit run that looks like this?

And, you know, should the faceplate be on or off? And are you going to mount everything in this kind of enclosure? That kind of thing. So we, um, Now have like a much more rigorous checklist. I think that ultimately create a lot more consistency and funneled all the problems through a single, single consistent platform of professional services.

James Dice: It's funny. While you were talking there, I was thinking about the checklist manifesto and then you, it sounds like that's what it ended up being like. Um, and, and did you guys have, um, it sounds like, I just want to clarify for people listening. It sounds like you had contractors, like basically bidding on all these different installations.

across all these different regions and you're basically going to [00:43:00] and and sort of centralize those.

John Coster: Well, we have. We have contractors all over the country, right? Like you said, from Puerto Rico to Hawaii and everywhere in between, and our contractors hire electrical subcontractors, and, you know, my background is, is, you know, I started off as an electrician as a journeyman, you know, like, and I looked at the thing, I said, how hard could this be?

You bolt it on the side of a panel, you run the cable through, you snap it on the CTs. It's like, it seemed like it was a very straightforward plug and play kind of thing. And, uh, Not that I was involved in any of this physical stuff, but I was like, we get all this wrong data. And we're like, how could that be?

And then it took us a while. And we finally figured out that, that no, the simplest things can be really messed up, you know, um, by Billy Joe Bob, you know, down in somewhere putting this thing on and, and he's a qualified, or she's a qualified electrician, but they just didn't do it the right way. And it's like, okay, well, let's, let's, Let's make sure.

And so we finally realized what we had to [00:44:00] have was have, you know, Vertigris had to have certified people that knew how to do it the certain way because it wasn't a commodity. It really wasn't like any other, like, stick in a squared e breaker or something. It's not like that. You actually have to have specific skills and do it the right way.

So, um, You know, if you think that your goal is trustworthy data and the trustworthy data requires not only good software, good hardware, but the right installation, well, you just better get all those right. Yeah.

James Dice: Um, any other challenges that come to mind that you guys overcame and would recommend people think about before beforehand?

Mark Chung: Um, well, maybe the last one, this was maybe a small challenge, but I think one of the things that, you know, we. I have now since addressed, but initially, uh, John's NEON team had a desire to kind of like suck all the data in as, as much as possible. And, um, we weren't really anticipating the level of consistent [00:45:00] pinging that would happen to our servers on our, on the API side.

So there was a few different times when there'd be like a data drop or a data loss, and there wasn't a, a replay mechanism and we would lose, uh, A bunch of data on their side, they would look, it would look like they lost a bunch of data. And it was really just, we were not prepared for the amount of traffic that would be coming into the cloud based on how much, how frequently they were pinging, so we've, we've.

Definitely fixed that, but that was one of the early challenges of just adjusting the data and being basically adjusting to what is the expectation on if you're developing a very sophisticated internal business intelligence tool, how frequently are you going to get this data and how frequently you're going to update and just working with the internal development team, figuring out that the right API interface needs to exist with the right level of service level expectations of how those SLAs will work together.

I think that Noting what kind of application if, I mean, obviously there was a lot of exploration on, on both [00:46:00] sides, so I wouldn't fault anyone on that. But if you knew ahead of time, what is the intended end use application, you could probably better set up what is the expectation of the SLA performance from the API, so just, um,

John Coster: and I think, yeah, going along that line, I think from, from an owner standpoint, um, trying to figure out the materiality of, of data sampling rates, so.

As an engineer, I just want to see everything I can, um, but there's a price. It's not free, right? There's, there's network and storage and all kinds of things. And so, you know, I say to my team, Hey, I want this once a second. And they go, why do you want it once a second? I'm like, cause who wouldn't want it once a second, right?

I want, who knows what we could find, right? And it turns out that sometimes you have to sit back and say, okay, well, the price of that is this or the constraints of this or that. And so, well, what about once a minute? What about once every 15 minutes? And it turns out it actually. You know, even though Vertigris can give you this very high 8 kHz sampling rate, the truth [00:47:00] is that that hierarchical tool can build that hierarchy based on 15 minute data, right?

So it's the smarts at the back end that what you do with that data, even at 15 minute increments, that over time, over a few weeks, all of a sudden, you can build a digital twin that's pretty accurate. So I think that one of the things we've learned is, recognize, What you're asking for, and it's sort of like when you're doing product management, you say, what's my minimum viable product?

Well, what's the minimum viable sampling rate that I can get based on the capabilities of my systems to get what I need to know? And then if I think that maybe there's more that I could know and I want to increase that, well, that's fine. But sometimes you may just want to do that in a small sample set.

You don't want to run that across your whole team. Portfolio because you're just going to overburden your, your systems and for no reason apart from the fact that you're just, you know, you're geeking out on having high sample rates. So just materiality I think is important.

James Dice: Yeah, I love that. It's a new acronym for me, minimum viable sampling rate.

That's great. [00:48:00] Um, All right, as we sort of close this conversation, I think this has been a fascinating, uh, case study conversation, so I really appreciate both of you, uh, for bringing your different perspectives here. Anything to close off with that you'd, you'd leave people with that we haven't covered yet?

Mark Chung: I mean, for people who are just starting on the journey of trying to get metering data, I think You know, a lot of what we described, a lot of, I think, what T Mobile is, is doing, you might say, well, that doesn't necessarily apply, um, you know, this is like, uh, uh, uh, uh, like John mentioned, this is kind of like a Ferrari, but I think what we've tried to do, at least what Vertigris has tried to do, is Provide things in a democratized fashion where the level of sophistication and capability that we're providing are not things that don't exist in other domains.

In fact, that's where we came from, different domains where technology is really, really inexpensive. And so I don't think that. It has to cost you an arm and a leg to get the kind of capabilities that T Mobile now has with our platform. And so if you're, you're starting out thinking, Oh, I'll, I [00:49:00] just need something super inexpensive.

It might be more affordable than you think to start the journey of, of building really sophisticated energy intelligence

James Dice: applications. Totally, especially with all this automation and analytics that happens once the data gets to the cloud.

John Coster: Yeah, I think that's, um, I don't think we paid a premium. You know, when I compare what we pay for Vertigris, um, for the sensors on a cost per sensor or per data point, it's, it's well below or well within market rates for everything.

It's not like I'm paying for a Ferrari. Um, I use that term because of the sophistication of their backend and its capabilities. I think that. Probably the biggest takeaway, and I have a, I have the benefit of not just doing capital rationalization, but to do real cause analysis, and, you know, I tie economics and engineering together, so I do have that big picture, including the cost of risk, the cost of failure, the economic impact, the ecological impact.

Like, I, I get to see the big picture. I think one of the challenges for many people who are looking for telemetry [00:50:00] is that they have, the, the buyer has one thing that they're trying to solve, okay? And, What I would say is, if you're someone who wants to get some telemetry and you're really, you think this is cool, find out the other stakeholders in the business who would benefit from the kind of data that you can have, and then get them to co sponsor with you, because while I do capital rationalization, my operations brethren, to do pre fault detection diagnostics or other kinds of things, or having accurate as builts, are more Sure.

Beneficial to them than to me necessarily. And so finding other stakeholders who can get together and say, let's explore how we can take this data and make it useful to more stakeholders. In a broader context, that's where I think the success, that's, that's where the value is. If you're just trying to figure out what a thing does and how to make it run better, there's lots of ways you can do that.

But if you want to have a system, an ecosystem that, that looks for enhanced value across a wide [00:51:00] range of stakeholders and uses, then, then look for that in your, um, you know, whether it's reliability centered maintenance, condition based maintenance, pre fault detection, operational performance, or capital rationalization like we do.

Bye. Bye. If you can get those people in the same room, then you can make better decisions about the platforms to buy and to put in place.

James Dice: I love that. Oh, that's such a great point. And let's end there. Um, thank you so much to you both for coming on the show. Um, I will challenge you and see if we can do a case study on this at NexusCon next year, uh, we'll send you the request for abstracts when we get there, but this seems like, it seems like a case study that we could dig into more and I look forward to being able to do that.

Rosy Khalife: Okay friends, thank you for listening to this episode. As we continue to grow our global community of changemakers, we need your help. For the next couple of months, we're challenging our listeners to share a link to their favorite Nexus episode on LinkedIn with a short post about why you listen. It would [00:52:00] really, really help us out.

Make sure to tag us in the post so we can see it. Have a good one.

‍

Sign Up for Access or Log In to Continue Viewing

Get Access