Episode 162:

Accelerating Enterprise AI Transformation With Open Source LLMs Featuring Mark Huang of Gradient

November 1, 2023

This week on The Data Stack Show, Eric chats with Mark Huang, Co-Founder and Chief Architect at Gradient, a platform that helps companies build custom AI applications by making it easy to fine tune foundational models and deploy them into production. During the episode, Mark discusses the role of Gradient in simplifying the usage and adoption of LLMs by packaging open-source models into a service. Mark highlights the importance of collaboration and data sharing among enterprises for model development, the impact of LLMs on enterprises, the challenges of operationalizing LLMs, the future of AI and its potential, and more.

Notes:

Highlights from this week’s conversation include:

The potential of AI-driven applications (1:34)
The need for hardware infrastructure in AI experimentation (2:40)
Oligopoly on the closed side (11:50)
Advantages of private side vs. open source (13:18)
Leveraging valuable data within enterprises (16:00)
The urgency of adopting LLMs in the enterprise (24:02)
Expansion of LLMs into new business verticals (25:06)
The challenges of operationalizing LLMs (29:32)
Seamless experience with OpenAI (37:29)
Operationalizing with Gradient (38:36)
The early genesis of Gradient (48:53)
The democratization of AI through endpoints (51:44)
What is the future of language models? (54:07)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Kostas, this week’s show is with Mark Huang of Gradient AI. And I’m really excited about this conversation because I think it’s a great example. That type of thing that will further accelerate the usage and adoption of LLMs. So Gradient essentially takes open source LLM. So let’s say like Llama 2, right, you want to go to operationalize llama to for some use case, they actually package Loma two, into a service and essentially give you access to it via an API endpoint. So you get an API endpoint. And now you can literally send and receive data from llama to and they take care of literally all of the infrastructure, which is pretty fascinating. I think. One of the big topics here is what this means for ml ops. But I think there are also implications for data roles that we sort of traditionally see, as you know, data engineering, data science, workflows, that sort of manage this data lifecycle. And you’re almost jumping over a lot of that, which is fascinating. Yeah, 100%.

Kostas Pardalis 01:35
And unfortunately, I didn’t make it into the recording, but I had the luxury to listen to the recording already. So I have to say that’s, like, a very fascinating conversation that you had with Mark there. But yeah, like 100%. Like, I don’t think we agree with you, I think there are a couple of different things here. The first one is access to the technology itself, rights, weights, I mean, just building like a REST API there. Like It literally lowers the bar of accessing such complicated technology. So mods that Britain has, like, everyone can go out there and build, like, anything just by being like a front end developer, you can go now and build like an AI driven application, right? Which is amazing, like in terms of the potential innovation that can be created. But there’s, I think, another factor there that many people might not think about. And that has to do with infrastructure and primary, like the hardware infrastructure, right? Like, it is extremely hard today. Like for someone who wants to go and experiment and build around these systems, you just cannot get access to the hardware that you need to go and do that, right? So sure. Figuring out like, using a service that removes these forward, say, all the logistics around like doing that. I think it’s like an amazing opportunity. And it is one of the reasons that we see, like so much growth right now happening around AI, right. It’s not just like technology but also how the industry managed to react really fast. And so like delivering products that pretty much everyone out there can go like and start, like accessing these technologies, which is amazing. So let’s go and listen to what Mark has to say he’s the expert here. And this is like a very fascinating area. And we will have more guests around because things just changed too fast. Right? So nothing is settled yet and maybe things will change soon. We need to keep an eye on that. And like to learn as much as we can. So let’s go in here. What more Kastri. Let’s do it.

Eric Dodds 03:52
Mark, welcome to The Data Stack Show.

Mark Huang 03:55
I hear it’s great to be here. All right. Well,

Eric Dodds 03:59
we’ll start where we always do give us your background, which is fascinating. Can’t wait to dig into that. And then tell us what you’re doing today with Grady.

Mark Huang 04:08
Yes. So I’m the co-founder and chief architect at Gradient. We are a LLM developer platform. And I spent years all in the data space. Most recently, I was at Splunk working on streaming, distributed analytics systems and machine learning there. And I also prior to that was a data scientist, sort of working with the business stakeholders, shipping machine learning and data science there. And I actually spent half of my career as a quantitative trader at algorithmic hedge funds. I actually started my own in Hong Kong for about a year. Wow. That’s wild. Definitely. Definitely want to ask about that. How did you get into quantitative trading in the first place? Did you just start doing that out of school? Yeah, I guess I always had a bend towards statistics and data and always knew that it would be some intersection of the application of how do you leverage it? How do you get the most value? And how do you solve it? Solutions? problems? So going to University of Pennsylvania, you know, half the class always goes out into Wall Street. And I end up going into it. But I got way more fascinated about the technology in the methodology aspects of all that. Yeah,

Eric Dodds 05:37
totally. When we were chatting before the show, you mentioned that he, I think he said Citadel one because they had sort of the best like end to end stack, which is really interesting. Can you explain that and sort of the, like algorithmic trading world? What does this stack look like? Is it similar to sort of this stack that you would think about, you know, sort of like with a modern jazz company, or something,

Mark Huang 06:02
there’s really interesting parallels, I think we’re in an exciting time in AI. So yeah, it kind of reminds me of how, as much as I can talk about how hedge funds work. But it’s not really one thing, it’s not one strategy. It’s not one system, it’s not one set of tooling and insight that leads to someone to win out in their market, it’s actually being able to have that the most frictionless environment for any researcher, any new trader to plug in, and then actually leverage their strategy the best and being able to ship that and tweak it over time, and Citadel just really have one of the best systems in the world for that. And they’ve shown that over the last 10 years or so, that’s sort of my belief in the AI space to we’re kind of in an arms race where there is a need for, you know, the picks and shovels to be able to democratize that and actually help the enterprises that need to leverage it the most. Yeah,

Eric Dodds 07:03
where the friction points are similar in, like the hedge fund environment, I mean, you think about friction points in sort of algorithmic work, right, and you have like data inputs, right. Okay. Well, that can certainly be a, you know, a friction point you like actually running analysis analyses and sort of the time that it takes to do that? Are those sort of the same types of friction points that you saw in the hedge fund space?

Mark Huang 07:31
Yeah, I mean, it’s almost me coming from that space and going into, you know, the SAS software space felt really natural for me, because it’s almost like doing the same type of things, taking a ton of data, figuring out how to explore it, I figure out how to clean it up. And then how do you scale out the pipeline that it needs to put into production? And those are entirely the same on both sides of it? It’s just like, what’s the product, the product is different, right? At a hedge fund, the product is just the returns that you can give your investors and then the SAS businesses. It’s like, how do you sell your software so other people can get their problem solved? By what? Yeah,

Eric Dodds 08:15
yeah, totally. And what motivated you I mean, starting your own hedge fund is pretty wild. What was the point? Like? Do you remember the moment where you said to yourself, I think I’m gonna start my own hedge fund,

Mark Huang 08:30
I think it was a brash decision, to be honest, it was feeling, you know, a little bit like, the industry, you know, was ripe for a little bit of disruption, particularly went over to Hong Kong, on the emerging market side, there is a lot more opportunity and being a young, you know, 25 year old wanting to just start a new venture, and just not knowing Like, it wasn’t clear to me the exact problem I was going to solve with respect to that, but I knew there was enough interest there to allow me the opportunity to explore it. And you know, I reflect on it and I have this list of things I would have done differently and coming into Gradient, it just, it felt a lot different. It felt more so the timing is now these are the ways that I approach it. And you have your initial plan, that obviously changes but you have your initial plan, at least.

Eric Dodds 09:25
Yeah, I totally. Did you start a hedge fund? You know, when we think about sort of going back to the most sophisticated end to end stack that Citadel has, for example, did they have economies of scale on the tech side that you didn’t have as a, you know, a new sort of startup hedge fund?

Mark Huang 09:46
I think that’s absolutely true. And in effect, you actually see that, you know, particularly today where there’s sort of this gravity effect towards some of the largest fund managers and in Being someone where, you know, you could have actually the same exact Strout set of strategies and the same model predictions coming out. But because you plug into their system, they have everything more optimized: the transaction cost analysis and the way that they’re able to execute the trades and the way that you’re able to get feedback from all that interest that you just can’t get where you have to roll out your entire stack, right? It’s like developing software over itself. We all know, that’s a business in itself.

Eric Dodds 10:31
Yeah, yeah, for sure. No, that’s just infrastructure, like as an advantage, you know, especially sort of end to end. That’s super interesting. Okay, I want to switch over to talking about Gradients and LLMs. But what’s interesting about Gradient is that you focus on open source models and private models and sort of enabling those things. But before we dig into that, could you give us a 101 on the model landscape? And I mean, I know a lot of our listeners are familiar with AI, I know, we have a lot of listeners who are probably working on AI Ceph. But we also probably have a lot of data engineers, or analysts, or people around the space who have heard of these things, if maybe read about them, but maybe they don’t have a wide view of the horizon when it comes to the different options that are available. Right. So of course, you know, open AI and chat with GPT. You know, most people have heard about that by now and have, you know, prompted chat GPG for something, but when we sort of go a layer deeper and think about, you know, LLMs as an entity, what are the options out there? Give us a sense of the landscape?

Mark Huang 11:50
Yeah, absolutely. So, you know, on the close side, there’s a few major players and they would be open AI, anthropic, Google has their own and then in cohere has their models too, as well. And there’s kind of that oligopoly on that side of things, but then on the open source side, it’s actually really interesting, because you just have these open source models, right? These are bases to be able to actually trade more data on top of and meta has been pushing forth to the open source constantly for that. So everybody has heard of llama and llama, too. And it kind of felt like the moment llama two came out, there was an explosion of builders, like these AI builders in the open source who are releasing their own models off of the base and all the hard work that meta had. So amongst those choices, you get, you know, models like llama to you of code llama, you get a few of the other ones from hugging face like bloom. And everybody is sort of taking all of these democratized foundational models, which is what they’re called to be able to build on top of.

Eric Dodds 13:04
Yep, yeah, that makes total sense. What creates the oligopoly on the private side? What is it like, why is there such a huge concentration there? Is it just access to additional resources?

Mark Huang 13:17
Yeah, I mean, I think if you can raise a billion dollars early on, then you can basically get that right. That’s the gravity, that is the gravity that is pushing the oligopoly. On the close side, it’s effectively very concentrated groups being able to raise a lot of money and basically being able to fund the research in the computer that’s necessary to effectively create the state of the art models. So on the open source side, it’s a bunch of people collaborating together to democratize access. And I think, you know, that’s where I’m very excited about.

Eric Dodds 13:54
Yeah, what so let’s talk about the like, you know, of course, it sounds, it sounds on the surface, a little bit of a David and Goliath type story. That’s obviously an imperfect analogy, but can you explain maybe the advantages from your perspective of the private side, that’s, you know, you can sort of do infinite infinite training, you know, access to infinite sort of hardware resources, all that sort of stuff. What does that get you? And then what does the open source approach and sort of the collaborative approach get you like, what are the advantages of each approach?

Mark Huang 14:34
I think, when it comes down to it, a lot of AI has the principal aspects of machine learning and data science. There’s still the same word basically, it’s about the data. So you know, the close and the private model side with open AI and all those folks, they’ve taken a lot of care and you During the data cleaning in, in making sure that they can train these extremely large models, sure they have all the computers. But if you look at all the model architectures, if you look at all that PyTorch code and look at all the different architectures of the new models coming out, there’s not a lot different actually, they they’re just trained on better data or more data in on the open source side, that’s sort of my belief is the collaboration benefits of being able to share the data across each other, and even more. So the fact that enterprises may have some of the most valuable data out there, inside of themselves, but they’re sensitive to it. So then being able to leverage it is going to probably take us into kind of the next era of what I see in model development.

Eric Dodds 15:49
super interesting. Let’s dig into that a little bit. So when you say enterprises have some of the most data, like valuable data within themselves, give us an example of that.

Mark Huang 16:00
Yeah, I mean, Stripe, for example, they have all this transactional data that they couldn’t ever release, they need to be incredibly careful about it. And it probably greatly outstrips all the transactional data in the internet, because that’s just so sensitive, right? So from the standpoint of, you know, some of the products they create, I believe it’s raised the right radar, which is an anomaly detection service that they have internally, in all, the modeling that can be done with that data. You know, they’re basically sitting on a corpus that the entire world can’t see. And other companies have the same exact aspect there too, as well. And particularly even governments in healthcare, right. Like, those are kind of some of the hardest places to penetrate into for AI in services. But, you know, you see this pressure for adoption in a lot of them are starting to, you know, become a little bit more open to adopting AI within their enterprises.

Eric Dodds 17:06
Yeah, that makes total sense. Let’s talk about the data. And the comparison between, you know, so obviously, you said open AI, like, one of the advantages is just massive amounts of data, carefully curated and prepared, you know, so that the model can produce outputs that are, you know, highly tuned. Even if you think about an enterprise or think about stripe, right, it’s a much smaller data set relatively, right, that they would be training a model on than say, like what you put into, you know, open AI, right. But the data sets are also very different, right? It’s a, like, homogeneous data set, you know, that generally sort of represents, like, a data structure that’s very similar across all of the data points within it, etc. Right. Whereas, you know, open AI is sort of, you know, a huge variety of different types of data. Can you talk through that tension, right, because one of you said, you know, a model is only as good as the data you put into it. One of the aspects of that is scale, one of those aspects is quality, like, when we think about an enterprise, leveraging an LLM to get more value out of the data that they have. Are there thresholds or economies of scale that they need in order to sort of actually operationalize and open source LLM?

Mark Huang 18:34
Yeah, I think that’s sort of our, you know, a Gradient, what we sort of believe in is that it’s not one model that’s going to be sitting in an enterprise. So it’s going to be like 1000 models, like, what does the world look like when these large enterprises are launching 1000, they have 1000 models inside their enterprise that are helping them to either improve productivity, operationalize their work, or actually even become user facing. And what they need is to be able to have like custom models that are really good at what they need to be right. Like, from a business standpoint, a lot of times, you kind of know what you want, you want this model to do X, Y, and Z, and then everything else, you know, it’s fine. If it’s interesting, that ‘s really good. But what you just need is just the new model that will do really well on that specific task. So if you can take, you know that subset of data, you actually don’t need so much of it, and you can actually, you kind of call it we call it fine tune your basement llama model to like, do that particular task better.

Eric Dodds 19:45
Yep. Do you see I want to get into the fine tuning in the specifics of Gradient in a minute but I’m interested to know, you know, from your perspective and what you’re hearing you know, from people using Gradient especially in the enterprise, when you think about the, like an enterprise adopting technology, you know, there’s, there can be different rationale for the type of technology that you adopt. Right. So, you know, going back to the old adage of like, no one ever got fired for buying IBM, you know, you sort of like a large, like, well established company, you know, tons of resources, right. And so maybe that’s one of the members of the oligopoly, that’s okay, this is stable, supported, you know, all that. But then also, there’s the portability and sort of flexibility and lack of vendor lock in that comes with open source. What are you seeing in the enterprise with these companies who have this really valuable private data? I mean, both of those are the rationale that people have used to adopt new technology in the enterprise.

Mark Huang 20:52
Yeah, I think that, you know, our observation is, everybody will have tried open AI. First, it is a fool’s journey to think that you’ll get in as the vendor before them. Yeah. Why? What we’ve noticed is effectively, almost everybody says, either, we need a complementary solution to them, like we want to iterate ourselves and learn a little bit more how to develop these models for our custom use cases, or we know we need to move off of them, which is kind of interesting as it’s almost so new and developing the technology and people already kind of planning for the future. So yeah, they want the first thing they ask us before I even open my mouth to say that we have the access to the open source models are there they say? So do you have llama teeth? Do you have a code llama? Like, that’s the first thing they asked me? And I’m then you know, I’m just like, Yeah, we have that. That’s like, that’s the point of us like to be able to give you guys access much easier.

Eric Dodds 21:55
super interesting. Okay, another question about the enterprise. This is just fascinating, because I think it’s, you know, things are changing really quickly. So the inside is really, really helpful. You’re inside of enterprise organizations, the, and I want to talk about Gradients API. But, you know, previously, or let’s talk about maybe even going back to when you were working in the hedge fund world, right? You have dedicated infrastructure for algorithms, you know, bespoke development of these models, you know, that are sort of driving things, and just massive investment in these teams and infrastructure that it takes to actually do this. Right. And now we’re seeing that become democratized. Of course, you know, through the oligopoly, and through open source, how are you in but enterprises aren’t necessarily the quickest to change? Is this actually sort of accelerating organizational change inside the enterprise?

Mark Huang 22:58
I’ve actually been shocked to see how quickly enterprises are trying to adopt an AI strategy. And also like, how quickly they are willing to talk to someone like us, right, like, radiant is, you know, we’re a startup and why should we get in the door for a conversation when they have been trusted partners with many other vendors, but it’s the fact of the matter is, they see our ability to deliver the solutions that they need, and then they have the pressure to want to adopt AI that I think that yeah, it’s kind of the first time I’ve seen something where they, in a sense, put the cart before the horse where they really want someone to help them adopt it. They want to know, hey, can I automate this? Can I do all these things? And, you know, what do you offer to help me do that? Rather than, you know, asking where the other vendors sit, they just know that they need something really badly?

Eric Dodds 24:01
Yeah. I believe that a lot of people leave. And of course, I do as well that, you know, this is such a monumental task. This is a huge step. And I think, you know, there’s this sea change in sort of all sorts of things, right, that LLMs are going to drive in terms of, you know, even just baseline productivity every day, but like customer experiences, I mean, it’s going to impact so many things. But is that urgency in the enterprise? Almost like a FOMO that’s probably too informal of a term. But you know, I mean, is that like, we have to figure out all items or we’re gonna fall behind or are you seeing a lot of companies be really strategic and know exactly how they want to deploy it or leverage it?

Mark Huang 24:53
So you kind of see both. I’m not gonna lie, right? Like, yeah, talking it openly. I can almost form a school’s hand and you know that AI is there’s a little bit of FOMO effect across all enterprises, right being afraid that you’ll fall behind. But then, you know, with some of the enterprise customers we work with, like, particularly on the automation side, what’s interesting is that people are looking to expand their selves into other business lines by getting LLMs in there. So like on the audit, you always saw, like, the UI paths of the world do like process automation. And the automation was that sort of the first iteration of that. But then what happened next, these days, what I view is they take that process, and now they’re generalizing to other business verticals, where now the flexible LLM is going to open up new doors to them. So they’re viewing it as a revenue generator, more than anything.

Eric Dodds 25:52
Fascinating. And we mentioned vendors, how many vendors are there out there? I know that sounds like a dumb question. But, you know, of course, we talked about sort of the, you know, the oligopolies. Do you have a handful there of like the big well funded private ones, but like, how many vendors are there? And I mean, it seems like they’re sort of cropping up every day. That is a pretty fragmented landscape already.

Mark Huang 26:17
You know, you kind of if you take the 1000 foot view, you mostly, you can’t miss the old awfully of the super large research institution companies such as open AI or anthropic, and then on the you know, the smaller side, and you kind of have the open source side, you hugging face kind of stands as monoliths there. But then you have all the other smaller vendors around. And I think that the space is definitely more fragmented than you would think. But it’s interesting to see, like the problems are the same, which is why we sort of set out to work here . It’s just not easy to get started to get access to these models and do the things and build and customize them for like a single developer. And it’s even harder for enterprises to get started on a lack of knowledge or an inability to scale beyond these vendored solutions for the workloads that an enterprise actually expects.

Eric Dodds 27:17
Okay, one more question. Before we dive into Gradient specific, sorry, I’ve been, I keep thinking of interesting questions as how is this impacting people who have worked in ML in the enterprise? For some time, right? Because I mean, to some extent, you almost see a leap frogging of, you know, let’s call it sort of traditional, like ML process or workflow, right? I mean, it’s an API now, right, which I want to dig into. But that’s pretty significant, right? For sort of traditional ML teams who are running, you know, full end and, you know, especially even like, sort of the on prem, like ML ops infrastructure is that shifting a lot in the enterprise?

Mark Huang 28:04
I think that’s probably the main observation we made in it. We honestly defined the way that we released our product, like we made sure, we made sure to learn from the open AI release, which was that it was the easiest way and the easiest interface for anyone to get started. So us being like, you know, web API’s to call on models to run their fine tuning and to run their completions and inference on top, like, that was the way that we wanted people to experience it, and unlock like, just basically remove all the developer friction to be you.

Eric Dodds 28:45
Okay, can we let’s talk about developer friction. And this is a great way to dive into the specifics of Gradient descent. So let’s talk about llama to without Gradient, if I just, you know, I’m at Company X, and I want to go use llama to to operationalize whatever it is we’re trying to do with LLMs a recommendation or, you know, whatever we’re trying to, like, leverage it for, right? Information Retrieval for our users, whatever app we’re building. Can you walk me through like, I go, like, what do I need to do to operationalize that just on my own sort of hand rolling it and then talk us through what is that experience like with Gradients?

Mark Huang 29:32
Yeah, absolutely. So Well, first thing you gotta do is call your AWS or GCP, you know, salesperson, tell them that you need a reservation on a bunch of one hundred or more GPUs. He’s probably going to make you pay three years up front in order to run your development tests because you just can’t get them

Eric Dodds 29:54
and then demand really that high.

Mark Huang 29:58
Yeah, I would say I’ve, I’ve never had so much trouble just trying to get hardware. And we’re like,

Eric Dodds 30:05
Maybe we should just go sell it. Maybe we should go be hardware sales.

Mark Huang 30:11
I mean, I think that’s kind of the funny FOMO effect of a few companies these days, like, we decided to build software. And there is a bit of FOMO on like, hey, the hardware could have been pretty, pretty nice business too. Right. Okay, so sorry to interrupt there that yeah, for sure. That just struck me. Okay, so I’m calling up AWS, and I’m gonna pay three years upfront, for the hardware that I need, okay. All right. So, from there, you’re gonna need to, depending on how large the model is, like, we support on our public interface, the 13 billion parameter model, and in enterprise, we support the biggest one, the 70 billion parameter model. So you’re gonna have to learn a little bit of distributed computing, understanding how to distribute this model across multiple GPUs do a lot of load testing, in terms of, you know, any time that you send a piece of data into it, you will come across in your tests, the dreaded out of memory error, it’s called the womb, right. And on the GPU, unfortunately, it’s kind of a one way door, it wounds and you have to restart the entire system. So you go through that type of pain in operationalization of that. So you have to build out that system. And then now, the probably one of the hardest aspects that we’ve worked a lot in trying to facilitate is having a system that didn’t run at high concurrency, like having a lot of users come in. And then even beyond that, how do you ensure that every user can do training in customization of their model without blocking everybody else, because as it is, you’re effectively pushing these GPUs to their memory limits. Yeah, and having like, 10,000 users come in one day to try to throw their data at this model and drain it is going to cause, you know, system outages and slowdowns and latency. So building out that entire infrastructure being highly available concurrently in low latency and having the ability to handle the micro batches. Those are all the things that you’re going to have to think of in a system standpoint. So and that’s

Eric Dodds 32:35
lightning DevOps, and SRE, I mean, you’re just talking about like, you haven’t even really gotten into, like working on the model itself.

Mark Huang 32:46
Yeah, like that’s just to be able to run a singular experiment. And the friction to be able to do what I like to say is, like, you know, play around with these models. So you have to go through all that, in order to get the harness that you need to be able to run your experiments. And then to actually start operationalizing different experiments on top, it’s like, you have to grab data, you have to make sure that you can format the data in the correct format to send to these models. And then you have to also ensure that, hey, what you’re doing is going to be available to everybody to kind of perform those tests too.

Eric Dodds 33:28
And then when we talk about outputs, and operationalizing, those, how much further of a push is that to sort of actually deliver those outputs? You know, in some sort of experience downstream, right? Because I mean, even if you see you get your experiments up and running, and let’s say you have, you know, you have this thing running, you know, running in a way that it’s actually operational, well, then you actually have to sort of package the outputs and then deliver them as part of whatever downstream, like experience that you’re building, say, right. So like, essentially delivering the output as a user experience, right? I mean, one one way to sort of operationalize an LLM, like, what is that? What does that process look like? Maybe, let’s say last mile, right, if you’re trying to build a product for your end users, that leverages the output of the

Mark Huang 34:21
room. Yeah, I think, from that aspect, let’s say you have this system and you can have a highly available, you containerize it I don’t know, some people like Docker, you can use whatever type of virtualization or containerization you want. And part of having the model actually run in production is being able to send data you know, questions prompts, they call them prompts, right. These are language models, you send your company and it gives back a response and hooking up all of the the pipelines in order to have a effectively an event driven service to handle that, that the last mile for us was actually the hardest part for making it available to users, like being able to have the inference calls correctly on the custom models that people have trained. And then making it so that, you know, they can spin up new models and try those out and maybe even ensemble the models and deliver it into a product through like, one endpoint.

Eric Dodds 35:35
Interesting. Can you dig into the inference piece of that, like? You said that was the hardest problem? Why was that the hardest problem? Is it because you actually had to build essentially an event driven, event driven system that could process those things? Of course, you know, you probably need, you know, to manage ordering and all of those different things. Like, what can you dig into what NATO is, and then how you solved it?

Mark Huang 36:04
Yeah, it’s a lot of the same things I sort of alluded to earlier with respect to unlike the CPU, where you can process data and you know, the, you’re able to kind of allow the different batches in queueing systems to handle the data. So even if the hardware goes into some weird state, you’re able to recover from it. On the GPU, the moment you get data that’s too large, for instance, you have, you know, you always talk about context limits in these models. So suppose you’re still under the model context limit, but your system context limit is much smaller, and you have a user that brings in a really long set of text, and having a knot, takes down the entire server for the other 10,000 users, like that’s a problem in micro batching. And then being able to deliver and serve that request in a reasonable amount of time. Like, being able to distribute that workload across multiple GPU chips, and handle the fact that at the same time, someone actually might even be training their model on top of training their data on top of the same model. In having that all available to all the users all at the same time. We’ve really only, you know, seen that with open AI, like, they’re one of the vendors that has delivered that experience. Right. And we all kind of know, you know, how popular they are due to that seamlessness?

Eric Dodds 37:42
Yeah. Yeah, it is actually. No, it’s, it’s so crazy to think about, like, Oh, I’m getting this timeout on, you know, GPT for API, you know, and it’s like, the thinking about what’s going on under the hood is insane, right? I mean, the amount of like requests that they’re handling is pretty wild. Okay. So this is interesting. So I’m thinking about being in an enterprise. I’ve tried one of the big players, you know, big private vendors, okay, it gives me an idea of what I need to do, but I need something more bespoke that I have more control over. And then now I’m at a point where I realized, Okay, well, actually, he enrolling this myself is a new way more of a DevOps and sort of, you know, software, like, you know, high transaction, super low latency software engineering challenge than I thought, because I just want to use an LLM to deliver this product. And so I come to Gradient descent. Okay, so walk me through operationalizing it with Gradients?

Mark Huang 38:57
Well, all you gotta do is go to our website, Gradient.ai. And click sign up. And from there, we have $10 with free credits, so anybody can try it actually right at this very moment. And you just need to create an account, download either our SDK or CLI. For better experience, honestly, you can just use Python and hit it through CRO or use curl request space. And you just hit our endpoints and you run your completions on it. You can try them, you know, llama right now and ask him whatever questions you want. And also send data in to train that model and get your own custom model out of it and be able to, you know, see what the change was, like, make the funniest kind of use case that we have internally. Someone’s trying to train a model on Rick and Morty. Scripts are like making a Rick bot. So yeah, we’re just playing around with it. It’s pretty hilarious, but it’s kind of that type of frictionless experience, like, don’t have to think about the infrastructure, all it is to you. It’s just a product, it’s a service that should give you answers or completions whenever you hit the endpoints.

Eric Dodds 40:15
super interesting. Okay. So can you talk about your SDK a little bit? So what is that? So let’s talk about the I mean, obviously, you can hit it with curl, just write some Python to ask it a question. But if we’re talking about, you know, putting this thing into production, what’s the SDK experience like?

Mark Huang 40:36
It’s a lot like, open API to be honest. So basically, with your token, right, you will have access to our endpoint. And all you have to do is you point it to the name of the model that you want. We’re going to be supporting many more models. But to start, we mostly just have the llama two flavors of the models. Yep. Then you have a text string. So you import your ARMS, import the Python library, do Gradient AI, and then give it the model name and then type complete. And because it’s a completion, technically, and then, you know, some sort of question. Let’s say, you’ve already trained that model on the Rick and Morty script. So they’re saying like, you can be like, Hey, Rick, why are you so mean? And then it’ll give you back a tongue in cheek response. So that’s the, you know, kind of experience there. And with the SDK, you can do a lot more interesting things. I think we have some notebook examples out there, where you can actually build entire systems on top like, you know, three vole augmented generation is kind of a topic that a lot of people are talking about to be able to build, like, effectively a knowledge base that you can ask questions on and prevent, you know, hallucination of the model, like the model telling you things that it’s just making up.

Eric Dodds 42:03
Yep. And okay, so that’s the SDK side. When we talk about trit, like, you know, feeding the model of data to train it, what is that experience? Like?

Mark Huang 42:14
It would just be, for us currently, what we support is just a list of JSON. So you just send it a JSON that has just a, it’s a string, so you send it a string of questions, and it should give you, you have question, query response pairs for the model, and you just send it into our endpoint in the model stream, just straight off of that you and you know, it’s sitting in our sitting in that server governed access for you, and you can run your completion on top of that. So you don’t really have to save anything, you don’t have to think about it, you’re saying, We give you the model ID and the version. So conceivably, you can build a ton of models and actually have like this system or ensemble models, in whatever product or system you want to build.

Eric Dodds 43:10
Interesting. And so do you. So that’s interesting on the versioning? Because I would guess that’s really interesting for companies who want to sort of toonies right, so you’re, do I access? Like, you I access that metadata, like via the SDK or the CLI?

Mark Huang 43:27
Yeah, that’s exactly how you would do it. CLI experience is very slick, it’s very easy to do. And then the SDKs. You know, it gives you a little bit more power in slightly a little harder, but, you know, it’s like one line extra. Yeah.

Eric Dodds 43:42
Okay. And then, do you keep a history of the sort of versions and the, like, the versions of the model that have been run? So like, is there a history there? Or is it just sort of one to one, like, I’m getting the metadata back, and I would need to store that.

Mark Huang 44:00
So, so long as the user currently experiences our experiences, so long as the user doesn’t do a delete on that model ID it, you know, exists. Oh, interesting thing, they are able to just grab it. So, you know, we had a user the other day he’s building, he wants to build a chat bot service. And he wants, the interesting use case that our technology was uniquely suited for is like, he wanted every user to have their personalized chat bot experience. So he wanted to launch 1000 models on our service, and asked us if we could handle the load. And I couldn’t actually tell them at that exact moment if we could, but apparently, we can actually do it if he was able to just launch all these models in a for loop. And just get them all in and try to create that experience for all the users.

Eric Dodds 44:54
Wow, that’s super cool. Yeah, I’m just thinking about, you know, situations where you may be bringing new People tend to work on models and sort of having access to that historical metadata is really interesting, right? Because you may be looking at output, you know, that was produced previously. So that’s super interesting. So you said, Well, you know, there are a lot of llamas in two flavors. What other models? Are you excited about bringing into the Gradient platform?

Mark Huang 45:23
Yeah, I think we focused on it. That was time intentionally like, thus far, we have all the text modalities. So everything is sort of natural language. But well, I’m particularly excited to start supporting the multimodal models, like the text image, or like, image synthesis, and taking in images and kind of the one to many modality models like text to audio, text to image, text, all those type of things that I just think are getting unlocked at such a rapid pace, right? It’s actually hard to keep up. And being able to have those up, I think, will bring on even more experiences in bringing a lot of different different types of people that want to work on these. So I’m really excited to start doing that, like in the next quarter. Yeah.

Eric Dodds 46:23
Okay, let’s talk about fine tuning. And so you know, we think about, you know, sort of, okay, big vendor, private user to get what you get, I want to actually do something myself. You know, so I started using Gradients, but I want to start fine tuning this model. What does that process look like? And then also, maybe, like, I want to use my own private model, not necessarily like one of the llama flavors, because yeah, that’s possible to ingredient right is using your own private sort of operationalizing, your own private model?

Mark Huang 46:56
Yeah. So on the enterprise side, we allow for, we do support, people bringing their own models, and putting it on top of our platform, depending on how compatible the model architecture itself is to our platform. But you can bring in your own model, and try it out. And also, if you want to just take the open source model, and train it on your own data. It’s yours. Like, we don’t keep the we don’t keep any of the model weights on like the source tenders. And then, particularly for the closed source vendors, you’ll see kind of in the licensing that they don’t, technically, you don’t technically own the model itself, but you own the completions of the model. So I don’t know the implications of that. But all I know is that we just allow you to have it right. Like it’s yours. If you want to download it later. In the enterprise, we just allow them to have government access. Yeah. itself.

Eric Dodds 47:59
Yeah, that is super interesting. That’s like an interesting way to achieve lock-in. Yeah, you do all this work. And then like, you can only do basically, like, fun putting all these parts, you know, into the machine to make it run the way you want. But then you can only compete. Yeah. Super interesting. Okay. I want to rewind just a little bit. What, and I didn’t actually ask you about this specifically earlier, but when did you decide to start Gradient descent? Like, what was the moment where you said, Okay, I want to doesn’t mean, it’s really, I mean, thinking about, you know, companies who want access to the open source, or even just this is infrastructure that allows me to, like, just have an endpoint to operationalize my own model, super compelling. Was that the original idea? Like, can you tell us about the very early Genesis of Gradient descent?

Mark Huang 48:53
Yeah, I think the early Genesis for us. So I come from a right, machine learning background, my co-founder, Chris, who I’ve known since freshman year of college, who I trust a lot. He worked at Netflix, so you can just imagine the experiences that they were driving there, and he was head of sort of the studio intelligence there and Syria innovation. So on the product side, and what he saw was right, like, for both of us, the friction for the enterprise. And anyone in general, like users of these platforms, they just had so much developer friction to be able to use AI, and machine learning. And then the other part was, it just never felt quite right in terms of, hey, I want to go from A to B, right. Something that you talked about was it almost feels like today, you’ve hopped over needing to have the expertise. Like it’s not a machine learning team member necessarily who’s like, using in adopting the AI? It’s almost like my mom or my dad, if you wanted to say, Tomorrow, I had this business. And I think I could do better because I have all this data. And I want to improve the product. Right? Like, that’s almost the thing that was always missing. So through, you know, a few iterations of what we wanted to develop, we finally landed, you know, what, like, you’re saying, kind of this wave of this unstoppable wave of, you know, the release of jet GBT? That this was the right experience. Like, this was exactly what we needed. And luckily, to get it out there to like, bring it to people. I explained it like I was five, right? Yeah. Now, we didn’t have to explain to everybody like there are five, like they already understood it. So yeah, you know, that was the decision point. And we got the company sort of going through the same vision that we thought it would kind of get to, but maybe, you know, did we expect it to be purely a large language model based? Not necessarily, but it was, it was always going to be kind of inexperienced in endpoint experience for everybody to be able to adopt their AI?

Eric Dodds 51:17
Yeah, I mean, it sounds like it’s so funny. But yeah, I think, you know, obviously, chat GPT, or at least in my opinion, will go down as one of the most high impact interface decisions, you know, in tech history, for sure. And what’s interesting about it is hearing myself say, like, Okay, you just, you have an endpoint, you know, to operationalize your LLM, right. It’s that easy. You know, five years ago, like, would have sounded like, on a massive Enterprise Project, like, et cetera. But Chat GPT makes that make sense, right? Because an endpoint is essentially just a way to exchange information via a very simple interface. Right. And so, it is interesting, the chat, GBT, like, just created a context. You know, for that, which is fascinating.

Mark Huang 52:11
Yeah, I think it was sort of that shift from the standpoint of, we found our ICP was actually the app developers, right? Like it was those were the people like, oh, wow, yeah, yeah, I have a lot of empathy for the, you know, the deep learning scientists, because that’s sort of what you know, I worked on for many years, like in machine learning and understanding all the interesting aspects of that, but then you kind of get towards the people that want to productize it, and then you realize that, like, that’s what our product was made for. It was made for those people to be able to easily embed it into what they needed, and not actually need to know how it works. Like, yeah, when you build a distributed system, everybody sort of says, you know, you wish you would have never done distributed systems at all, cuz it’s just,

Eric Dodds 53:03
yeah, yeah, no. I think that’s a really good thing, right? Is that the democratization of this giving people an endpoint that allows them to, I think about it as unlocking creativity really, right. So you know, we kind of talked about like, jumping over the ML, you know, the entire ML team, and literally giving an app developer an endpoint and saying, Look, you can create experiences that weren’t possible for you to create before. And all you have to do is sort of leverage this endpoint, right, and it’s easy to actually put it into your app. Just super cool. I think it’s gonna unlock and already has just huge amounts of creativity, which is super exciting. Okay, we’re close to the buzzer here. One more question. What excites you the most about the future of LLMs? And what scares you the most about it?

Mark Huang 54:04
I think what excites me is seeing what’s going to happen with people running like, their 1000 models, right? Like I kept inside of the enterprise. And like, what is that going to look like to combine the sets of data that exist out there? So you have all the databases and all the database companies in the world and people were used to structured data, and then LLM ‘s are mostly like unstructured data and like, what is it going to look like when you have entire stacks built out with kind of the gravity of all the data coming into like these language models, or these, you know, the embeddings models for vector search? And how does that kind of transform what the enterprise stack looks like? I don’t have an answer to that. And it’s something that we just kind of Every time they talk to another client for that, it kind of looks a little bit different, which is really cool. On the side of what scares me or, you know, there’s a few things I think people do talk about the AI safety in there. Yeah, I think more so than like the, you know, kind of end of the world scenarios that we were talking about. It’s more so that the education and understanding of like, how do you actually govern access? And how do you make the models do what you think is safe? So like, a light into all that, like alignment is a huge topic in the industry. And additionally, maybe part of why we started the company is sort of feeling like, now you have these LLMs, billions of dollars being put into AI and these models in the infrastructure are still built on a house of cards. I just feel so brittle. And I’m kind of terrified. You know, it’s you know, over time, it terrifies me that you have all this experience and the frictions need to be made better. But I mean, plenty of companies out there trying to work on that ourselves included. I think that’ll eventually mature so those are all interesting things. I think it’s a super exciting time. Yeah.

Eric Dodds 56:19
Well, Mark, this has been absolutely incredible. The time flew by. I feel like we started talking five minutes ago, but I guess it’s closing in on an hour. But congrats on Gradient, such a cool product. I can’t wait to actually sign up and use my free, my free tend to curl or send a curl request. So yeah, congrats again. And thanks for coming on the show. Amazing Convo. Yeah, appreciate it. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

Episode 162:

Accelerating Enterprise AI Transformation With Open Source LLMs Featuring Mark Huang of Gradient

November 1, 2023

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter