Two Think Minimum Podcast Transcript
Episode 024: “BigID CEO Dimitri Sirota Brings Fresh Ideas to Privacy Debate” 
Recorded on: August 20th, 2019
Sarah Oh: Hi and welcome back to TPI’s podcast, Two Think Minimum. It’s Tuesday, August 20, 2019 and we’re here in Aspen, Colorado at the 2019 TPI Aspen Forum. I’m Sarah Oh, Senior Fellow at the Technology Policy Institute. Today we’re excited to talk with Dimitri Sirota, CEO and cofounder of BigID. Dimitri is the CEO of one of the first enterprise privacy management platforms called BigID and a privacy and identity expert. He is an established serial entrepreneur, investor, mentor and strategist and previously founded two enterprise software companies focused on security and API management, Layer Seven technologies which was sold to CA Technologies in 2013. Welcome, Dimitri. Thank you for speaking at the Aspen Forum.
Dimitri Sirota: Thank you for having me.
Sarah Oh: First, could you tell us a little bit more about BigID’s products, and your journey with this company?
Dimitri Sirota: Sure. So the journey basically started with a PowerPoint about three years ago, maybe a little bit over three years ago, and the origin story of the company was really to help organizations better safeguard the data they collect and process on individuals, on customers, on employees, on clients. And clearly there was a sense prior to starting the company that whatever was being invented in Silicon Valley and Israel in terms of security wasn’t sufficient because you still read in the headlines about massive breaches, misuses and so forth. And it struck me that part of the challenge was that organizations had very, very poor visibility into what data they actually have on their customers and employees. They have different teams building applications to collect as much information as they can. That information gets spread throughout the corporate environment across various kinds of databases and clouds and so forth. And not knowing what data they have makes it very, very hard to secure that data. In starting big ID myself and my co-founder, who’s based out of Tel Aviv, we realized that there was an opportunity to provide an accounting like framework to help organizations better guard and better govern the data they collect and process. And in so doing address emerging privacy regulations like the California Consumer Privacy Act as well as just become better stewards of personal information.
Sarah Oh: That’s great. So you mentioned accounting methods. What were companies doing before your solution?
Dimitri Sirota: They weren’t doing much. I think, from a security standpoint, historically companies really focused on trying to lock down the edges. So maybe it’s the access to a server, maybe it’s the access to a network by a firewall, maybe it’s ensuring that somebody provides more than just as a username/password, like some kind of biometric to get into something. It was very much kind of a doorway type of strategy. And clearly that was insufficient. People were still finding ways, whether it was through the windows, whether it was through the help of getting inside. One way to secure what’s inside the contents and the data that store, whether it’s in AWS, in Azure, in a net app server or an EMC server or whatever, is really to just know what you have there. And if you know what you have, you have an opportunity to make certain decisions around it, whether it’s to minimize it or delete it. Why keep stuff you don’t need? You could make decisions around duplication. You can make decisions around it a person. You can make decisions around tokenization. There are a lot of things you could do, but knowing what the problem is and knowing where the data is, is the first step.
Sarah Oh: I can imagine that this platform, you would collect lots of different types of categories of personal information or privacy levels. So can you tell us a little bit more about how do you tag a piece of data for its level of privacy?
Dimitri Sirota: So we don’t tag explicitly, we actually took a very different approach. We used almost a tautology in defining what is personal data. So if you look at the privacy regulations that are getting introduced, be it the European like GDPR or US like CCCPA, they revolve around this notion that companies are responsible for all personal data they collect and process without necessarily defining what that personal data is. So if you look at a lot of the breach regulations passed over a decade, a lot of them are a lot more explicit about the kind of data, credit card and social security. The privacy raves are much more broad. They care about all the data that a person produces or that is about that. And so as a consequence, when we were starting BigID, we took a very different approach. We said that the key thing was understanding how connected certain data is back to a person to define if that data is personal. If you could basically map some piece of information back to a person, there’s some personal connection or correlation to it. And so as a consequence, one of the things we did is we said, look, it’s not enough to just tell you what type of data, you actually have to measure how connected and related that data is to a person to figure out if it’s personal. And that’s a unique thing that we have. And part of the reason we’ve had success over the past three years as those companies are contending with some of these requirements around individual data rights and the need for a company to be able to find all the data that belongs to a person, you need a novel approach to understanding what data is personal.
Sarah Oh: Interesting. So at TPI we do research on privacy and we try and measure the harms from data breach. And so there are some statistics about the price of a credit card on the black market credit card number is very low compared to, I don’t know, a more personal piece of property, like a fingerprint. Do you internally have a sense of how valuable pieces of information are?
Dimitri Sirota: There’s kind of two senses of value. There’s kind of the dollar sense. In terms of what somebody, a bad actor could sell your information to, and they usually sell it in bulk. But there’s also the value to me, if I’m in a car accident, the car accident may not necessarily be large in the sense that the car was totaled and so forth, but it could be a nuisance. For me, that may mean that I have to get a new credit card and have to change all my subscriptions. It may mean that I have to get a particular identity protection service. The value to me of my information goes above and beyond just the economic value of what a bad actor could sell it for. There’s all the other kind of mechanics of it. And I think increasingly, people, as our lives shift online and we’ve seen truly the decimation of the retail industry, the old brick and mortar landscape, has everybody basically now interact online, whether it’s through social networks, whether it’s through ecommerce networks, whether it’s with their car and automotive. Increasingly people are fearing about how their data gets used, how it gets processed, how it gets shared, and they want some degree of transparency and accountability around that. And so I think it goes far and above just a dollar value. Clearly the dollar value matters. And people have ascribed $200 in the cost of a corporation to be able to respond and defend a person’s identity. But beyond that, there’s a responsibility that companies have in terms of how they interact with their customers online. And conversely, customers expect more of those companies. If they’re going to give them something personal, their data, whether it’s a thumbprint, their credit card, their location information, they expect a degree of responsibility and accountability from that corporation.
Sarah Oh: Interesting. So your product is helping firms be more trustworthy. So yesterday there was a panel about how much do we trust big tech, how much do we trust big companies? And we hear about data breaches. That kind of lowers trust levels. How do you see the industry adopting your technology solution? Are they open to implementing BigID?
Dimitri Sirota: Yeah. So obviously the fact that we’ve grown from two people three years ago, I think we’re approaching 150 now and we’ve raised it an accessible $100 million. That’s a reflection of the fact that there has been some adoption. Now is every corporation using our product? Not yet, maybe in another five, six years. But certainly I think we’ve had an experienced above, above average growth. I think that’s just a reflection that with more of these regulations emerging, CCPA and GDPR being the most notable, but certainly there is draft bills across I think 14 or 15 US states. There’s now 135 countries that have some form of legislation or bill around privacy. It’s becoming increasingly critical that companies think about privacy as a first order problem. And I’m going to give you some analogies of where you’ve seen this kind of occurrence for what we do in data in prior, kind of, environments.
Dimitri Sirota: Even a hundred years ago, prior to the advent of any computer in the 1920s, there were issues around transparency and accountability in financial transactions. And so a whole firmament of regulations and standards were established in terms of how companies need to be able to record and report on what money comes in, what money goes out. This is kinda from 1920 on. And in a similar vein, if you think about data as the new currency to date, there really hasn’t been any form of accountability. You don’t really know what data goes in, you don’t really know what data goes out, and how the data gets used in between. You’re seeing an increasing pressure for that. Now, again, we also witnessed some of this with ecommerce. It took time for people to trust their websites. But to get there, we had to implement things like SSL, we had to implement certain standards and certifications around websites, we had to introduce things like PCI to give people assurances that credit cards are going to be protected. And now again, as more of our daily experience shifts online, I think you’re seeing that same amount of emphasis placed on your data more broadly.
Sarah Oh: I think it’s really interesting. I haven’t heard this take of accounting framework. We have gap accounting, and that’s kind of a standardized way that companies can record on the books what assets they have. It sounds like that’s how you’re thinking about data?
Dimitri Sirota: It is. Cause I think data is what fuels value to a company. The most important asset a company has today is not the physical store where- Barneys, which has just, I think, declared bankruptcy again. It’s not the physical net app server, which sounds kind of ridiculous. It’s not your mobile phone, it’s the data, it’s the ephemeral information about your customers, your clients, your intellectual property. Those are the things that are of value. And yet we have asset management and asset tracking for mobile phones, for laptops, for physical servers that are bolted down to the data center, and yet not for data. And if data is truly the kind of unit and currency of business today, of modern online digital business, it would make sense that we have some means of recording and accounting for that business. And yet it hadn’t really existed. I think the regulations like CCPA and GDPR are attempting to think about data as that new kind of unit of currency for the digital economy and introducing types of accounting controls. I’m going to give you an example that’s very relevant to data and I think it’s an important one and it kind of speaks to the run on the banks and you had before gap and so forth. At the end of the day, if you go to a bank and deposit a check, you have confidence that that bag is going to be an effective steward, they’re not going to take that money and do whatever they please and have no accounting system to know that it belongs to you, and so forth. And that anytime you want to be able to take it out from an ATM or through some other payment means that it’s available to you. And yet for data that doesn’t exist. You deposit information to companies all the time in the background, your apps are running in the background, you get into a modern car, it’s connected to the cloud, you walk into your house, Siri says hello, maybe Google assistant says hello, maybe Echo says hello. You’re surrounded by information that you’re giving over and yet there hasn’t really been any form of accounting in the corporation. And I think that’s led to situations where there’s been bad outcomes, like in healthcare in terms of breaches. I think that that is one of the things- that’s the big idea, if you will, of BigID, is that if you institute proper controls and accounting, you’re going to be able to be more accountable to your customer and provide greater transparency, greater value, and ultimately greater security.
Sarah Oh: This is actually really fascinating to me because yesterday on the productivity panel we were talking about how our economy’s moving towards intangibles. One reason why economic data is not showing productivity growth, some say, is because the value of our assets is derived from intangibles that are not well measured in gap or on the books. I didn’t know, but I guess your approach is to treat data as different types of assets. So maybe this is beyond your product platform or why companies use your product for privacy compliance, but do you see emerging ways that companies think about their data in terms of like depreciation? Is old data less valuable to them, or are bundles of data more valuable? Is that something that your platform helps companies look at?
Dimitri Sirota: Yeah, there’s two sides to that question. I think one is risk and one is value. And I’m going to give you an example. We help companies meet certain obligations that they have in under CCPA and GDPR. I’m going to give you one example that’s common to both privacy rights and really, what I believe is kind of the essence of the emerging privacy use cases, which is that every individual has a legal right to their data. In Europe they regard it as a human right, I would argue that in the US they regard it as a property right. But what they share in common is that individuals have a right to access, in some instances, a right to erase or deletion, a right to portability. But this notion that just because I gave you my data, I’m still allowed to get it back. Now that requires a complete rethink from an organizational standpoint because companies historically just collect all this stuff. They may put it in their big data warehouse and then run analytics on it, but they never really try and keep track of it. They never really try and figure out which data belongs to Dimitri, which data belong to whomever else. And so now they have to all of a sudden account for it and not just by type of data, but by the association or attribution to a person. So that’s a complete rethink. So that’s kind of one of the things that a BigID helps with. But to flip it around. So if you’re able to all of a sudden account for what data you’ve collected on Dimitri across your AWS, across your Azure, across your Google Compute, across your SAP, across your Salesforce, across your Workday, across your net app file shares across your Terra data, Snowflake, et cetera. All these places you could kind of a bolt hole or rattle data nowadays. If you’re able to do that, what else are you able to do? All of a sudden you have a picture of that person. You know how you’ve interacted with that person across your lines of business, maybe across your geographies. That’s incredibly valuable. You also may know that I have data that I haven’t touched or accessed in ages. I’ve collected it maybe eight years ago and it’s on healthcare data, it’s on financial data. Maybe I don’t need to keep that data. Maybe there’s an opportunity for me to back it up to the cloud. Maybe there’s an opportunity for me to delete it. And so in the knowing, in being able to address this checkbox around this personal data right, or individual data right as the Californians refer to it, I’m able to also answer to other things: how risky is my data based on what it is, where it is, who’s it is, but then also get a new picture of that information, a new picture of how I interact with that customer, which again increases the value of that customer to me as a corporation.
Sarah Oh: Yeah, that’s fascinating because in privacy regulation around the FTC, they need to have some sort of value attached to privacy. And that’s such a hard question for privacy scholars, economists, that need to inform the regulators about costs of their privacy regulation. As an entrepreneur who’s working with companies, is all the noise from Washington and Brussels, is it noise or does it affect your business directly?
Dimitri Sirota: Well, for us, obviously to some degree we’re an answer to the question. We’re a response to these new requirements that are being introduced. I think these requirements are obviously just the proverbial tip of the iceberg. What they, again, force companies into this new posture where they now need to be able to account for the data they collect, process, and share a to a much finer degree than they have in the past. And they need to be able to shine a light on it. For us, it’s important, I think, obviously a lot of our clients would wish that federally in the US there’d be some agreement, some shape, around a uniform privacy regulation. My suspicion is that it’s not going to get done in this administration or this Congress, maybe in the next. It may take the introduction of one, two, three, four, five, six, state regimes, all varying a little bit, to provide enough impetus for the Democrats, the Republicans to work across the House and Senate and get some of this done.
Sarah Oh: So maybe one last question to wrap up and it’s technical, but you can try and explain to me, how do you track the providence of data across all those clouds? So are the computer programmers- is your code attached to what they’re coding or do they have to enter in what they’re doing with the data?
Dimitri Sirota: No, they don’t. Essentially what we do- the analog I’m going to share without going into too much detail that will bore your audience- is a little bit about how Google many years ago started building a way to index the Internet. Google didn’t go out and copy the Internet. They found a way to score the relevancy of one website to another website based on the number of links, maybe based on other parameters. And I think there’s now about 200 parameters that they use in what they call page rank to be able to score the relevancy of one website to another. In a similar way, what we’re actually doing behind the scenes, is building a graph or a relationship map showing how connected this little piece of data maybe in this file is connected to this piece of data in a server which is connected to this datain a data lake connected to… And so that graph resides kind of in the product, but it’s just a graph. It’s just a relationship map. It’s not actually a copy of the data. The data stays wherever your developers and engineers decided they wanted it, wherever it’s performing the function that it needs to perform. What we are basically doing is building kind of this map on top of it. And it really is a map, and just like a map, it gives you an ability to navigate the data. It gives you an ability to understand the data, understand the risk and so forth. But the map is kind of separate from the geographic locations. And so in so doing, we’re able to build organizations an inventory of this asset, this critical asset, whether it’s in an unstructured file and a relational database in a, a Duke cluster, whatever that is. And again, help them navigate them and help them answer not only these critical privacy questions, like what data am I sharing with a third party and do I have the appropriate consent, or what data do I have on this individual and how do I report that back to that individual? But we’re also able to give them insight intelligence, what we actually call data intelligence, around this asset, which we would argue it could be either the most toxic thing that they collect or the most valuable thing. And hopefully, with some of the innovations we’ve introduced, we can make this asset far less toxic and far more valuable.
Sarah Oh: Interesting. So I guess one final follow up. It sounds like large enterprise companies can afford to have a privacy group and compliance to maybe collect that map. What about small startups that are just spinning up a database, do they need to update a map or does your software automatically collect that data?
Dimitri Sirota: So they need to be a customer of BigID. But yeah, the software itself, it’s not a manual. We’re trying to replace manual efforts. And I’m going to use another example from kind of the archive: Sarbanes Oxley. When Sarbanes was introduced in 2006 to provide greater visibility into what controls people had over who touches and accesses data and systems, it was all done manually. You’d hire one of the big four firms. They come in, they’d interview people, they’d provide kind of an audit report, and then a kind of a spreadsheet as a work product. Fast forward maybe 12 years, you now have an industry called the identity and access industry, which spans a lot of branching, small branches, just like you had in a kind of a hominid, kind of a human history map, things like authentication or single sign on. And really what those technologies were introduced to do is to automate this problem of being able to report and control who has access to what data or what application. In a similar vein. If you look at companies like BigID, we’re really trying to automate what today is largely a manual process. Today, the way it works is you would hire a big four or one of the other kinds of outsourcers, they’d come in, they interview people, they ask them, where did you put your data, how are you using your data? And they basically generate a work product, which would be a Vizio with some directionality for each data process and kind of a spreadsheet with an inventory.
Clearly that’s not very efficient in terms of time and effort. Clearly it’s not very detailed. People don’t have recollections of where they put Dimitri’s data. The other thing is at the end of the day, relying on recollections over records is not accurate. You’re not going to get good outcomes if it’s not truly reflective of what you’re trying to protect and safeguard. So what we’re doing is we’re adding automation, we’re providing a way for companies to do this in the background. We believe that, you know, again, in 10 years, 12 years time, this’ll be as natural as having accounting software. When accounting software was first introduced or accounting methodologies, it may have been a burden. People going, oh, it’s a new process that we have to follow and it’s going to require new people. But today I think you’d be hard pressed to find any company that use their accounting software as a cost burden. It’s really a way for them to manage their business. And in a similar way, I think tools like BigID will not only be able to answer these immediate problems in the privacy regs, but ultimately help people run their businesses. If you truly believe that the most important currency and asset that they have is their data tools like BigID are going to be essential.
Sarah Oh: Great. Well with that, thank you so much, Dimitri, for an introduction to BigID and it sounds like you have a lot of work ahead of you.
Dimitri Sirota: Thank you very much.
Sarah Oh Lam is a Senior Fellow at the Technology Policy Institute. Oh completed her PhD in Economics from George Mason University, and holds a JD from GMU and a BS in Management Science and Engineering from Stanford University. She was previously the Operations and Research Director for the Information Economy Project at George Mason School of Law. She has also presented research at the 39th Telecommunications Policy Research Conference and has co-authored work published in the Northwestern Journal of Technology & Intellectual Property among other research projects. Her research interests include law and economics, regulatory analysis, and technology policy.


