다글로 | 2024. 7. 24. 오후 2:21 Weaviate

2024. 7. 24. 오후 2:21 Weaviate읽기전용

One thing that you find is really cool about vector database is it kind of came into prominence when people started to use it because of the vector index coming out of machine learning models. And I'm working on this stuff for a long time already. So next year is going to be my 10th anniversary working with vector embeddings. So I started all the way back with something called Gov. which are like tiny word and benics, right? But it's like, you know, if you had like the word seal and then it says like, you know, that's close to Korean. So the tiny language models, if you will. And then all of a sudden that grow into what it is today. But there's something important because This is not just any database. That's something new that's happening. And if you quickly look at the history, right, so we look at the old systems, the ways we built databases back in the days, like Oracle and IBM and these kind of databases. And out of that, when the internet started to emerge, people started to use SQL, open source started to play a role, right? Nobody thought that would happen. And then we got the mobile, right? And I tell you, we have amazing companies, all the companies in the world, Redis, and the very important companies in that space. And only that, of course, the cloud, with the famous players in the cloud. And the question always was, with machine learning, is that just a little bit of cloud and a little bit of modern? Or is it something new? Is the AI thing something new? And so we believe, of course, that it's new. And I'm going to explain to you why. But in a nutshell, the reason why we're doing this in Europe is because the big difference between adding machine learning or AI capabilities to what you're doing is that if you make something which you see on top called AI native, means that if you take the model out of the application, the application is gone. So what you see happening there on the left-hand side, or for you on the right-hand side, that's We cannot do that and build those kind of applications without AI. So that's the term. And we fit in AI native vector data, or AI native data, if you will. And so what we're doing is that we're creating this ecosystem. So what you see, let me go to the screen, let me show you. So we have here, we have the frameworks of the LLX. You all know the, I think, James, that kind of stuff, right? the model providers ranging from AI, those kind of models, those open source models. I mean, today we saw a lot of interesting green models, all that kind of stuff that's happening. the container infrastructure data pipeline, and of course the hyperscalers, meaning the big cloud providers. And we sit in the middle, right? That is what we focus on. And that's the ecosystem that we're building around the database. But basically that's very simple. We take any type of multimodal data. So what we mean with multimodal is not only text, You create an embedding, right? The vector embedding is stored in the vector database. And the reason why that that's new is that we can now find things on more than just keyword search. We can search images, audio, the meaning of text, and those kind of things, these vector embeddings. And these vector embeddings are everywhere. So if you're very knowledgeable about models and if you dig into how they work, the vector embeddings are everywhere, right? What do we do with WeFix? So the core of WeFix is open source. And that's going to stay open source, and that's just the database. And the database, out of the database come a bunch of APIs. So everybody can use them. Doesn't matter where you are. You can interact with APIs with Python, Timescript, that would have you. But we started to also build our ecosystem around it. So one of the things that we're doing, and I'm going to give you an example of such a video today, is that what we call the educational part. So we believe-- so how many people in here are developers who write code? Who's writing code? Not now. Nobody needs my board. Nobody? Yeah, a couple of yeses and a thank you, great. So if you write code, we help you to understand how to do this. Because Wi-Fi is new. And so knowledge and you guys are actually building it and bringing it to our users and our customers. So we help to engage. And then give examples. And you have deployment options, right? So what we see nowadays, so we have like serverless, which is you go sign up and you're good to go. You have the enterprise cloud, which is the enterprise load and you bring your own cloud. Which means that if you want to run with your customers inside the VPC, you can do that too. The integration, well that was what I just mentioned about the embedding models. And then we have something that we call the workbench. So the workbench focuses on helping developers, right? So tooling, right? So the tooling can be the clients, can be the graphical user interfaces, the apps, can edit the database. And operations could mean like you have certain services for operations inside the database. So all this together, this is the ecosystem that we're creating and everybody can use based on the open source database. So that's what we do and what we focus on. This is kind of the trajectory that we're seeing. So if we start, everything started with effective search. So we've got these vector embeddings from these machinery models. Doesn't matter what kind of machinery model is, all of them produce vector embeddings and we could do better search and better recommendations. But the thing was with Tractor Search was that for us as a database company, that was amazing and a lot of people loved it. But it was kind of, you know, it was better. It was not newer. It was like better. And then ranked. And rank is a term that is like integration. So rank stands for looping of method to generation. What it basically does, if you want to build your own agents, and if you want to build your own search integration with chatbot applications, generative applications, and those kind of things, you need a combination of the model, machine learning model, AI model, and the vector database. And that's basically what rank stands for. I'm Greg something super exciting because you see Mac is just one-way street so you have a query goes to the vector database results go into the model presented to the user but we could also do is store the information back into the database so if we have something Just now that I can give you a demo, so you can see how that works. So this is what you kind of see here, right? So starting with search, we've got the chatbots, Q&A systems, co-pilots, but now the new thing is like the generative feedback. even with like the data financing and those kind of things. So let's say that you have a data set and you want to use a, you give the instruction to the database where you say everything stored in the database should be in print. So you can then store something in English. It just translates it for you. You don't have to do that anymore. So that's a completely different parallel. So it's like a it's not doing that stuff yourself anymore. It's not changing in the model sitting in between you, the user, the developer, and the database. So, well, this is kind of like a recap of what we talked about. I think the most important thing to talk about is that, maybe the third as well, but that's the first one. We focus on making it for developers as easy as possible to build AI applications. Because it's not easy, right? You need to learn how the models work, how to operate them, how the database works, how everything works together. So we try to help them. And again, I'll show you an example in a bit. And the thing is, I really like Gnostic, which is here, from how we deploy these kind of things. Why? Because everybody wants to use the database in a different way. And thanks for the database being open source. It makes it easier to do that. So I'll get back to the demo. So I'll skip this over when I've done that. But one of the things that I'm going to show you is about the part is this kind of stuff. So we were working with the community. And one of the things that came out of the community was this question that we-- Hi. Hi. Thank you. Yvonne, can I meet you? So one of the things that came out of that was that we got community in Korea. We got a question where they said, hey, this is all great that we can now do vector search. It's great that we can have multiple indices in the database. That's all great. But the models and the database don't work very well the Korean language, right? So how do we do localization? So then we work together with the community and with the users here in Korea to actually build that. So this is an example of like this is JP. And JP, he works in our developer relations team. And when our team develops it, this is the kind of stuff that we make. So I just want to show you this example. I hope everybody can hear it. There we go. Hi, i'm stephen huang. Let me tell you about some of the things that you can do with Viet and Korean data. Viet includes powerful integrations that help you to build amazing AI app with Korean data. Now, it includes integrations with multilingual models like those from Korea. It might be for Korean embeddings, or it might be for large language model integration as well. And now, we've added a Korean tokenizer to use with Viet. Tokenization is out of breaking up sentences into smaller parts. Now, this is relatively easy with Western languages like English, French, and so on. But for Korean, as you know, this can be more complicated. Korean doesn't rely on spaces, but breaking up a sentence like can be quite complicated. Now, in WinV8-1257, we've added a Korean tokenizer that can split Korean sentences into words. So that we can perform searches using words like abaji or pang and so on, not wrong words like kaba. I've created a short demo how to use the Korean tokenizer with WinV8. You can simply follow this end-to-end and see the Korean tokenizer in action from connecting to WinV8 all the way to 7. ingesting data, and how it works. And by the time you're done with this example, you might perform keyword searches, and you'll even perform retrieval of a fifth generation as well. And briefly discuss generative feedback rooms. And when you're done, you can try out the Korean tokenizer and also other VPN tools through our documentation. Thanks very much. This is an example where you see how we work together with the community on building extensions on the database for this very specific use case, the green tokenizer. That's it. And everything that we create and something that opens our ecosystem and then is distributed to everybody else. So a couple of things. Some people are excited about what's happening now in the database. So this is for the people in the audience who are really database nerds who like that. One that I really want to double click on, let's see if there's more. And I'm going to go for the first one. That is the 10th offloading. So one of the things that we see that when these use cases become bigger, right, and you want to store them, and you keep everything in memory, that's very expensive. So what we now have is that you can upload the data to your disk, or you can upload it to storage buckets. So for example, if you are on Amazon, AWS, you can upload it to a stream. And it's all just with a simple API tool. So you can build huge use cases with the fact that they're based on your models and just with an API code, you can upload it and And then for the rest, all the other things that you see is just we keep updating and making everything in the database better, more optimized, and more importantly, more easy to use. Because again, it's a big database. So I think also looking at the time for now, I'm going to leave it at this one. But what I'm going to do is that-- on the website and everything that you've seen about us. If you want to play around with this data set, everything is open source. We're going to do a couple more also release on Korean data sets, but the first one that you saw JP is already live. So if you just Google or something for WeEVA Tokenizer, you'll find it. And we would love to hear from you. How are you using it? How is it working? Is it helping you in building your applications? So again, thank you so much for having me, for listening. And I hope to see you next time. Thank you. We have a little bit of time left, so we'll take a few more questions. Oh. Thank you for your presentation. So we'll now take about three questions. So if you have any questions, please raise your hands. So does anyone have a question? Can you hear me? Is it OK? OK, but then I need help with translation of other questions. Thank you. Thank you. Go ahead, please. All right. Can we help you with this? We are trying to know the exact meaning of AI-based. Yeah. Yes, that's right. When I heard that, I thought it would be something outside of saving vector data and then carrying it to the query. Yeah. Pepto-DP. Sorry. I expect there would be more features other than just storing vector data or just party relevant vector data because there are a lot of solutions to that. Because you said you know ai native storage so i expect something there will be more so especially for the uh for the people who are engaged with the uh you know ai modeling or serving air models stuff like that so um. Yeah. So do you insist that the purpose of the BBN is to cover some of the features of Featured Score or Or if there is a difference between the feature score and the mediate? If there is a difference between the ordinary vector score and mediate, what would you like to see? Ordinary vector. Yeah, yeah. Thank you. Nobody did say-- that's actually really good English. So thank you. So this is an excellent question. So let me answer this also for the people in the room with a little bit of history. The vector embedding, so the way what you do with the vector embedding is they're just distances. So what you do is you compare distances to each other, right? So let's say that you have 10 documents with a vector embedding, 10 vector embeddings, and you have a query, it's also the vector embedding, and then you compare the first to the first, the first, the query to the second, the query to the third, et cetera, et cetera, et cetera. doing that is in itself simple. It's just a comparison. So that is what we call a brute force comparison. So that's how it started. But the problem that started to emerge was if you do that, if you have 10 documents you want to compare to, then if you do 20, it's twice as slow. So if you have like a billion, or many billions, it takes hours to get to a result. So the first step that Penny made, and mostly that came out of like Spotify, Facebook, et cetera, was something called approximate nearest neighbors, where people said, we have new algorithms that by being given an approximation are still super fast, even for billions. So now we can do that. But the problem was that the first of these algorithms, you needed to build an index. And then if you had the index, you couldn't update it, you couldn't delete anything, those kind of things. So the first step that came from the Factor database was we got to make sure that we're good at storing these embeddings at scale. And now a lot of databases support some form of MNX, so that was what you were referring to, right? So the first differentiator between the existing databases and the new databases is just about scale and ease of use. So if you use search engine, that are very well known, it's sometimes harder to do vector search on a large scale. So getting to a billion is almost impossible. So that's the first use case. And the second one is that when you look at the search cases, the ease of use of actually doing vector search. What then happened was that we learned that, especially for text, Pure vector search is often not enough. So you're going to do something called hybrid search, where you mix different indices. So something like a sort of PM25 keyword index with the vector index. That's the reason why we view it as a built-in Korean tokenizer. If you would not have hybrid search, you would have not needed this. So now the database that we did has this additional unique feature where we say, like, hey, you can do hybrid search. And out of that came the ask from a lot of users where they said, well, it's all great, but if I store all these vector embeddings in memory, that is very expensive. Like, very expensive, I still. So can you guys build something where you can offload this to S3 buckets and organize this better? So we said, yes, we can. So what now started to happen was that we had these filters, the data object storage, the ease of use to the clients and those kind of things. So that by the time you build a production case with Vectron beddings, you need all these features. And that's just a small set of databases right now that offer all these features. And I believe that it's fair to say that with the offloading, with the tenants and those kind of things, VGA is even unique. And the last thing that you said about feature stores, the feature stores have a slightly different use case. So the feature store is more for labeling. So if you say, I want to label a lot of data, I store the features in the database, which is the feature store. So that is not the same as the vector database, because the vector database really focuses on doing high quality similarity search in the vector index. So the feature store is a different use case. But, long story short, you can build your applications AI applications end to end on top of something like VJD. You can't do that with others. Very last thing, you asked about AI-native, and that's what I heard in your previous question. So, let me one more time to make sure that we like the definition. What we mean with AI-native is an application in which the machine learning model is crucial. So, if we take out the machine learning model, the application dies. That's AI-enabled. If you have an application where you add a little bit of machine learning, it's more AI-enabled. It's not AI-enabled. So in the first use case, where the model plays a crucial role, that is that we aim to deliver with VGA as easy as possible. at the large feature center I just mentioned for production cases. I believe that right now we're, if not the only one that has all these features available for hybrid search, for tenant offloading, for model integration, with the data storage, data filters, and so on and so forth for use case. So I hope that answers the question. Thank you. I can't answer it. Sorry. Hi, my name is Kevin. I have a question about what's the future plan for hardware acceleration? Because I heard recently companies like Julie's, they recently launched a whole collaboration with Nvidia's GPU in terms of acceleration for vector operations. I know LinuxDB is doing something with their CPUs using SAM. What future plans or anything on enrollment do you have? If so, maybe you can share some details about that. If not, is that something you're looking for in terms of opportunities? That's a really excellent question and that's a very detailed question. So doing factor search you can use two types of chips. So you can use CPUs Or you can use GPUs. So let me first start with the CPUs. So like the SIMD enablement, that's all also in WeJ. So if you just Google, so the whole SIMD optimizations are written all the way in assembly in WeJ, so that's all there. If you actually Google for WeJ SIMD, you'll find it. But we also did a lot of work together with Intel, so you can see that if you have commodity CPUs like you find in an AWS GCP, the premium CPUs, for example, from Intel, sometimes your vector search is like 10 to 40 times faster. Now, GPU-based vector search is way faster. Like, the throughput is way higher. But now an interesting thing emerges, right? So the question is, what should you use this? Because in a VJ program, For example, if you take an Intel CPU, You can resolve one vector search query on a data set of a million. So it's like, there's not like a lot, but it's just a benchmark, right? So you can solve that, resolve it in about three milliseconds if you're willing to no progression or not. So that costs you about $11 a month. So now the question is, if you say, well, I'm happy with the three milliseconds because getting to two is almost, yeah, that's almost, that's going to be really hard, let alone one. So if you say, I'm happy with three milliseconds to get my result, then the question is about throughput. So how many, because one CPU can do one operation, in this case, three milliseconds. So that means that in a second, you can do a lot of... You can get a lot of vector results on one CPU, right? So because basically that is the number of milliseconds, so you get 1000 milliseconds in a second, so you can just divide it by 3, so that's a lot. Now if you say that's not enough, then you say, well, you have a second CPU, so now you have not $11, but $22. The thing with GPU enabled vector search is that it becomes, the throughput becomes higher, but the GPU as you know is very expensive. So then the question becomes like, what is the trade-off between doing it with a bit of memory and CPUs? versus a GPU. So we are working with NVIDIA as well. We're also talking to them about the integration because the open source integration we did, you can add as many as you want. But the question that people ask us often is that we don't want to do it faster. We want to do it cheaper because effective memory gets expensive quickly. So the fact of the matter is-- and if I'm missing something I would love to learn. But one of the things that we're learning is that actually customers are not asking for faster. They're asking for cheaper. And with GPU, it gets expensive very quickly. So I hope that answers your question. So right now, we have And we have so, and that's in memory index, we have a mix of a memory and a disk index, and we have a disk index. And that's important because fully memory is the fastest. A little bit of memory and disk is in between, and a disk is slower. However, if you do it fully on disk, you don't have to pay for the memory. So that is way cheaper. So my point is that now what we see with people now going to production is that the use cases are very different and that the needs that people have to make it reasonable. I mean, you can have a huge data set and make it super fast, all in memory, and at some place, they'll have $2 million a month to run that. So a lot of our customers go like, yeah, that's great, but that's too expensive for us. So, which is understandable because of $24 million, just perfect searches. So doing a use case that costs you $22 million in memory on disk is more than 100 times cheaper. So now all of a sudden it costs you $150,000, right? And just buy it. So people are, that's an example, we are okay if it's a little bit slower, if it's more than 100 times cheaper. And they're not asking us, oh we're not gonna pay you two, but we're gonna pay you four million dollars if you do it on TPU to make it even faster. The companies that want that speed, that is like, just a few companies need that. like Visa or something, those kind of companies. But for the rest, that's a very, very, very niche group. So what's also important to-- lastly, to add to this-- we also need to think about what does the user actually need? So now with the GPU acceleration, that's also a little bit-- there's also some hype around it, like, oh, wow, we can do even more faster, which is great. But then the next question is, do people actually need that? Thank you so much.