noscript
2025. 11. 4. pm 12:02 Recording읽기전용

Produces product names without type of any typos it tells me that the challenge comes with the agents and the supervisor so I wanted to ask you what do the agents do today because it seems to me Agents should be engaged when you need to interact with tools and Apis and databases. Why do you need to have agents in this architecture? We are developing chatbot for sales agents in healthcare domain so in this context the agent should be connected to variable data sources such as like the sales data and the internal medical information and other some kind of current news about New drugs and those kind of things so we divided agents by its roles, as I said, for instance, like for first agent analyzes the transaction data and the second agent explains about the information about the product and those kind of stuff and each agent is consisted by the Such as writing SQL query to retrieve the data or retrieving the data from the vector databases or such the web and those kind of stuff so that's why we need the agent structure and of course we have LLM inside the agent. And we found out that the retrieval has no problem because there's no chance to change the null in that cases, but when the retrieval data goes into the LLM and those kind of issue happens. We don't know why, well, here's my understanding of how this would work, but differently management, I should say so the first thing I would do is let's focus the purpose of the agents only to retrieving data from databases or from different sources. I take the sales system, I take the transactional system, I take another system, let's delegate the task of the agents to perform very specific functions without having to generate additional total So this is the way I would say agents, the only thing you do is take the data from my sales database, show me the year to date or month to date for this particular individual in a simple number, right? Then the agent retrieves this data you use this as a placeholder to the supervisor so you don't have to use the agent to generate additional tokens wonder coach so make it very simple the agents do a specific role. Let the supervisor or let the direct interaction between the supervisor and the LLM VRI retrieve exactly what you need in this way you diminish the chances of mismatching the names that's one approach I will do, I will just say I don't want any agent to add any other tokens, I want the agent to come back and tell me, give me the graph or give me the table or give me the result. Right, use the result and pass it together with the supervisors to say, here are the year-to-date sales for Tylenol The result will always be Tylenol, right, you doesn't have to have the agent generate additional on them. This is what I do yeah, I get it this approach would help us allow to reduce the pursuing time of generating answers from agent because we don't have to do it anymore, but we found out that even though the agent returns the proper answer or proper reference, whatever The issue also happens in the supervisor when it goes to the LLM, so I think this is not a fundamental approach to solve the problem because we are still having risk that LLM and supervisor would Yeah, so here is how I have to solve only for the supervisor so on the supervisor side I would create an evaluation so the supervisor will return. Results and now I do an evaluation and you create an evaluation mechanism where you create a rubric to say 12 mill percentage of time you do not use tylenol, use a variant right and then if that number is created as zero, there is equity try again retry And then what happens, you rephrase the front in order to get what you want, but I would like to know percentage of time that the supervisor doesn't use the right terminology, do you know that? Maybe about like 20% to 30% I'm not sure yes so let's create between the supervisor this would be post-generation right so the supervisor engages with the agents it gets the data you don't have the agents do any Generation of tokens, the retrieval comes back and they're going to say, okay, let's do an evaluation, the evaluation if the evaluation is the one you expect, then you return the result to the user if not. Of the time you're going to say, okay, let's try again, let me rephrase this and then you do it again and then you return the result and again the evaluation right, so now you sort of do this a number of times and now you have 99% of the time what you're looking for this is what I would do. When you say evaluation doesn't mean like putting another LLM to evaluate the ends of the supervisor contains the disrupted product. Yeah, so what do you do is you create a prompt and you look at even on Microsoft assured, right, they have the ability to create there is an evaluation prompt, would you say evaluate the result? Return by the supervisor, tell me percentage of time, it returns words that don't match the lexicon and if that number is greater than zero, then say, I'm sorry, could you try again but now make sure that you focus on the words that exist in the lexicon and then what happens? Happens as you re-force you, you have the agent we try right in Brazil, it wouldn't say the supervisor would take that yeah, it would be safe approach to ensure the final output would be safe, but there would be also a trade-off between the Time and accuracy okay, I get it, we'll we'll consider about it and I also have a question we've researched it a bit and we found out that there are some fine-tuning method or like changing the possibility of the what is that the logit bias That which is provided by Azure Openai API, how do you think about that like is it a common approach in Batika AI or do you recommend or not just kind of any, it will help us a lot if you share any kind of ideas about that. Yeah, so this is the first approach I recommend I recommend because it's easy to try, look at the evaluation service in Microsoft Azure and you're going to see that the templates when you can use to ensure that the supervisor returns what you're looking for, right, the second approach, I would say. Let's actually use a particularly trained embedding model. So right now I'm not sure what embedding algorithm do you use the generic model do you use the embedding we are using API services provided by Microsoft Azure so we have really limited controllability about the model. Okay, so you cannot create your own embedding basically yeah, but as an Openai provide some kind of like fine-tuning services or like changing the possibility of each word so we are considering about it, do you have any suggestions about that? Yes, that is the second approach I would do, I would just say, no, let me create my own embedded embedding model way, because right now the words that you have in the lexicon, you need to have examples for each one of them and then what you do is you're going to create an embedding model. You have slightly bigger chunks and you have a space that's much narrower than the general embedding, right, so this is the second thing I'll do just say let's create my embedding model and then I will use my embedding model. Just going to provide me with my higher return rate for the words and fishing for right, so this would be the second approach I would say okay, let's do a new embedding model, the third one I will do a fine tuning and you mentioned the fine tuning John, where you're going to say I have Data says with, let's say, one thousand records where I know how the output should look like, and then what I do is I use that to do a training job to fine tune. So it should be an hour no more than an hour so when you do the fine tuning you can specify specific parameters and some of the parameters you may you look at the Openai, you will see that they have parameters related to the tank size, you have parameters related to the Space meaning what is the embedding space for your domain in your case would be healthcare and medical rate there's already a pre-trained model for that so you should take advantage of that so do a fine-tuning job and then you're going to have much higher accuracy in my perspective. And then there are approaches that I would also consider in relation to experts in the middle So there is an algorithm called direct preference optimization look at DPO. So there's a DPO method where you've fine tuned a model based on your preference data collection, is it also possible on API services or is it only possible like while during training their own models? No, you take the whichever model you use and how what you do is you're going to do a DPO. Again, what this does, you're going to show preferences for particular words against others and what happens instead of having reinforced learning with human feedback, you have a DPO dataset and look it up, it's very easy to do. Is actually very, very effective in my opinion so then instead of using the generic model with rag, you're going to say let me train a fine-tune mile model and you have Peft. Very efficient time tuning, you have DPO and you have RLHF right reinforcement feedback that thing is expensive I wouldn't worry about that so I would do my DPO and I would just say, you know what I now I decreased my My vector space, my weight, I will strip in my weights and the accuracy is going to increase in my experience, the DPO will make a significant difference. So in your case, you don't have a particularly your own train embedding model, so what you're going to do is you have a high divergence and you lose value because When you create a new word because of the tokenization, it's going to actually put the tokens together in an order that makes no sense to you, right, this is what's happening now. So you have the tokens that begin with a tie and then you have another tokens that may be lenore, but the weight is not reinforced now this is the symptom of what happens. this is why the agents are returning unexpected token sequences. You know. But what else have you tried because it seems to me that there are many, many options, right, so you do it training of the model, you do prompt engineering. You do evaluation during the generation and you do Jason and I saw that you tried the Jason and you do the hard substitution right, no, what you already tried is like just putting all of the name of the medical products and are the Name of the hospitals in the prompt and I assume that with that approach the prompt gets too longer and there is some kind of like lost in the middle problem so that's why it's not working. So we have to try your suggestions from now on so I'd like to know if there are any suggestions it will help us a lot and also I have a question about fine tuning I've read a documentation about the fine tuning which is provided by Azure Openai and the dataset looks like the pair of the question and answer. So I'm curious about is it also suitable for just training the list of the words such as product name of the product or context names? I've also asked the Chat GPT and it told me, like, fine tuning is not suitable those kind of cases and it suggests me to let a change the possibility of each was like, as you mentioned, like changing the lexical metals kind of steps. Yeah, yeah, so what happens is when you create I mean, I would use chatgpt to create my data set, right, so what you do is you're going to take let's say you have two sounds I don't know how many, how many words you have in the lexicon, but you can keep adding And then what happens is use the chatgpt to say generate, generate one thousand sentences with these words and then what happens is chatgpt will generate that and what you do is you look through them this is good, this is good, this is good, this throwaway throwaway and then you have a data set, I called out a golden data set, right? So then what you do is you use this to actually to train your model, you can also have examples with, say, okay, when you see Tylenol never return Tylerbol or whatever. What it is right, so what you do is you teach the model what enough to do as well, not just what to do, because right now my hunch is that the rack is teaching your is constraining your GPT to return. The right values, but then what happens is when the agents do the call, they lose that connection and the connection is being lost because the agents themselves, they sort of have to reassemble. The tokens back to the user and when they do the reassembly, the weights are being lost and I think this is one of the challenges that based on what I've read, you're facing, right? So I think if you have desired input output pairs and then avoided to avoid input output pairs so for instance, if the sale rapple lasts, could you tell me what should I do about this situation and about this particular drug? The desire output would be this drug has had success in this particular situation and this is an acceptable return what is not acceptable to change the name of the drug so I think what you want to do is you want to teach your model To distinguish between preferences and what not to do, this is what I'll do because the rag right now, the rag, the biggest chance with the rag that I see. Is and I looked at the examples you gave with Tylenol it looks like the chunking happens, the tokens rate and Iraq they happened but unnatural ways. Tylenol is broken down in Tylenol, right, but maybe it should be tylenol, a single word and I think this is something that you can reinforce to say this is a single word to not tokenize it and this is why if you do your own embedding, you can actually dictate that. Yeah, you know, yeah, again, yeah, so I think it's common and the supervisor, the easiest thing to do for you right now would say, okay, make sure that the agents do not add additional words, just the facts, they retrieve the facts and let's say What you do with the supervisor, you're going to say, okay, I put an evaluation between the supervisor and the user and then if the evaluation passes, I'll return it to the user, if the evaluation fails, I'll rephrase the user query and I'm going to do another evaluation, you can do this maybe two or three times and then In my experience, this decreases. The bad returns by significant margin, maybe one in 100 is going to be bad so the safer way to ensure the final output is putting the evaluation at the end and yeah, initial approach would be putting another LLM on it. And I'm still concerned about the trade-off between the speed and the accuracy. So is there any other way to post-process or evaluate the final output without using LLM, such as pattern matchings or similarity search or discussion search? You could also do regular, right, so in this case you're going to say, hey, I found a word that is very close to Taiwanol, right, sometimes we replace it with Taiwanol so you can do directly regular expressions. And I think you can try it out because this is really, really fast and you can do directly in memory that the user will not even realize that, right, so what happens is that the evaluation servers is going to engage with the evaluation servers. You're going to say, okay, let me find out if this text has the words I'm looking for. And then if it doesn't, then you're going to say, okay, I'm going to find a closest match and I'm going to make sure that I go back to the user query from the user query, extract the words I'm looking for, and that's what I'm looking for in the response, you know? Then evaluation service itself would have information about our full list of the products and also full list of the context and hospitals, is there any services on Asia that provide those kind of stuff or do we have to build by our own? I've been assured has an evaluation surveys as part of an AI staff, if you look. Just look, an assurance for evaluation service, there should be no evaluation service and there is a specific front that I give you as an example as to how to create that. Okay, so you create a rubric and you just say, okay, imagine you are a supervisor pharmaceutical company, you read this text and make sure that the return matches the following terms. And then if it doesn't match, then show me, show me how you'd correct it so then there is an evaluation service you can use directly but if you want faster, you can do a regular expression match to say, okay, the user would for Tylenol. If I don't have tylenol in, what do I have and then you're going to say replace that with tylenol, so The problem itself is inherited in LLM because the LLM works LLM relies on the possibility of like next word so which is which should be solved by another evaluation metric I get it. I also have a question about like, is there any other problems or issues that happen in the other industry based on your experience, and could you explain? This kind of stories because I also have to pursue my client that this is the common problem and I had an interview with experts and he told me there are some cases about blah, blah, blah. This is a common issue that happens in financial industry, very specific terms like bonds and shorts and things like that, this is a common thing that happens in biomedical, not just pharmaceutical. This is a very common thing that happens at the legal industry, right, so I work with all those and one of these techniques will help you. Constrain your output because right now, as I mentioned, you're embedding user out of the box embedding model, that thing, the tokenization is actually very, very small, your words are going to be jumboed, right, so they're going to be misaligned. So what you need to do is with the rag, the rag actually works and you told me that you don't have any errors when you interact directly with the LLM directly, so you always get the right answers, right? So this also tells me that that you can actually use this to your advantage, right, so for instance, for instance, you get the response from your supervisor, you can actually pass it back into the LM and say, okay, can you can you help me correctly as given what you know about? The products we have so you can actually use this new advantage so you're not that far off because the supervisor if it returns improper names. It's because it's losing, it's losing potents, it's so what you need to do is you need to ground it one more time, what they're at call and I think this is another option that you can do right away, you can just say no one. Before I return it to the user, I'm going to put in a prompt the response and I'm going to say, can you help me correct this to replace the names that are mispelled with names that are on the list so this is another way for you to deal with this because I just realized that right now when you try directly with the other line, you don't get any errors right when it depends, like sometimes like 20 to 30 percent. When the reference has some kind of pronounce, the issue happens 20 to 30%, but if it's not about the specific pronounce, there's no problem. Yeah, so I think if you're 80% accurate from the beginning, I think that's a very good number and then what you could do over time, you can use different approaches to improve on that and I think we have options here. Because let's say I give you 80% of the time the proper answer I want an accuracy review 99% typically that's what I'm looking for 98 99% so if I for maybe to 99 I can actually manage that I can just say, you know what I can deal with this and I can do with refronting, that's very sort of straightforward. Then I can do my as I mentioned, I can do my own evaluation and I can do more more a clear format for my output. So a few shots provide examples of design input output pairs where the output trigger adheres to the keyword list so you actually Make one more call to correct the mistakes that's what I would do I mean, you're very, very close because the challenge with what you have today is that your agents are completely misbehaving because the agents Have been delegated to do too much. I would constrain them, I would just say don't bother with even me anything, I would put it together right, so your supervisor should put together agent A, game of this, agent B, game of this, I'm going to take the data, I'm going to describe it, I'm going to take the data from the second agent, I'm going to describe it. I'm going to return it back to the user and now the question is will this result return to the user have any mistakes that's my question and now what do I do about that so he's 75, 80% of the time, no mistakes, I'll return it straight up. So in fact, some time I need to detect it there is a couple of that so what do I do I take it and say, okay, could it correct in the States given the list of products we have, you know, yeah, I get it. I also have another question which is not directly related to this topic, so I just want to hear your suggestion about as an expert, like, have you ever encountered any kind of clients that ask you and your chatbot to Generate the specific formative answer for each question, such as like, the simple question would be. What is the total sales of the hospital A in last year and when the agent gets those kind of queries and it would generate the question, but sometimes like client required us to Regard us and our chapels to answer in a really specific order and specific structure and there are lots of inquiries about that this question should answer this structure, this question. Should answer like this structure and if you're getting all of it and you try to teach agent to behave like that the pump gets like infinite longer so how do you handle those kind of clients if they ask like too much about the response format? Yeah, so typically what I do and I know what you mean because I've voiced with clients that ask God, because those kinds I say here is the proper way to answer this question, right? So what happens is I break it down into smaller products so I sort of aggregate it and I pass it to one agent this is a quantitative it's much easier to deal with, it's much easier because let's say the My client, your client is going to ask here is the problem sequence in which you have to query these tables and these databases, right, basically, and I say, okay, the first question is, okay, let's identify the source of the data, okay, these are the sources of data. Do I have enough in the metadata data that I can use okay, can you write a query so the first step is write a query against the metadata to return when I'm looking for and then what you do is what I do is I create multiple steps I don't do this in a single problem because I don't I don't do that, so I use in this case a coordinator agent, a coordinator sort of mechanism that says, okay, I'm waiting for answer to question one, when I get the answer back, I'm going to go to question number two and so forth. In this way I can trouble should what happens much easier and I can explain I said the first thing I'm going to do is this. Here's the intermediate sample I get to show that or I get not to show that, but at least I know how to fix it, you know, it is something called like context engineering which, like dynamically makes the prompts with like smaller chunk. Yeah, exactly exactly, exactly exactly so you have to build this context a little bit of the time because you have specific instructions of the users about how to get to the answer this is very helpful. Because the users tell you, oh, you have to go to this table, then you go to this table, then take the results from the first table. Use it on the second table, then you do an aggregation, a sum, whatever you do and return that result and then show how you got that result because most of my customers with my client and say, say, show me how you got to that result. Show me a number because they hate it up right, so they will say here's what I did the first thing, here's what I did after that and here's the interim media itself and in this way the user gets to have disability into the interview. Build out on the answer, you know yeah, I totally agree about your approach and but those kind of method has a risk about if the coordinator fails to choose the proper context or proper tool, the total system would fail like how do you handle those kind of risks? Yeah, so what I do, I love that question because what I do on the coordinator sort of I present the plan to be used here, right, and say this is my son or something like that, so I say here's my approach. Do you agree with that and if they say, no, I don't agree with that, change this, I change that, okay, so what happens is I agree with you that is a chance of failure and then what I do is I go back to Troubleshift and I think that's why I like this approach because I know where to go and figure it out in many cases in my examples, the metadata is wrong, right, so the user says, well, this data should come from this table and this database and This is what it means but in reality it means completely different things so the users know that but they don't tell me that because the metadata may be mislabeled or the metadata may not be complete so what I do is I help them and what they know with what I should know right so I create An additional prompt entry to make sure that I account for what is not in the database, in the metadata, you know? What I do, that's what I do but I do it as a chance with failure but at least I know where it failed and then when I build a response back I cannot seem to retrieve this however I found that the real agmity is that and then they sort of clarify right, but If it's something predetermined, like I worked one time with a client that said my VP wants to save this email every Monday morning or Tuesday whenever about last week. And here is what the email looks like and here is where the data comes from and now there is a person where there is a query in a database that runs and then the email gets generated so they say you create an email that describes more about the data, not just the query result, that's the way they had it. Again, in this case, I would use a tool to go to the database, directly, engage with the query, so the query gets saved in the database and then what I do is I execute the query, I take the query result, put it into the LLM. Take the context and I generate an email based on the data and now everybody's happy because now the tool itself, my agent engages with the data. Directly so I don't have to build a prompt that's another option I get it so basic philosophy underlining this method is To make the system breakable, I mean then break down the system and limit the freedom of the LLM in order to control, divide and conquer the behavior of the whole system, I get it. Modular because otherwise and I've worked in the past for fields, a lot of evidence can do everything but that's not true you have to be very careful about that so you have to say, no, I have an agent NCP right, I'm going to say, okay, give me the tooling to interact with the data. The query already exists I don't have to write it the only problem is the same company I was working with the VP would reply to the email can you tell me more about that then this email would go to the LLM. So the LLM would have to write and say or they would interact with the chat, hey, I got this email, can you take a look at it, I don't understand where this number is coming from, because that Doesn't make sense right so now I need to actually create a query to find out more about that this is more challenging on the first part, you know, okay? But I think you are on the right track, you're very close, 75, 80% to start with is great now you need a mechanism to detect you have a misnamed entity in your In your response, once you detect that you have multiple approaches, right, the direct approach, go directly to the LLM and say rephrases, take into account the actual product names, that's one way, the second thing, right, all kinds of things, you have all options. One last question is also the off topic, but we found out that some queries should be handled with multiple agents, especially in a sequential order, such as like, if the query comes in like What is our most competitive product and the strategy to sell it so to handle this kind of courage like first. The agent that retrieves some kind of product sales information in order to find the most competitive product and then it also has to handle the result of the information to another agent sequentially. To noise those kind of new stuff so my question is like in these cases the order of the Running should be dynamically generated by supervisor and what is the like most common or best factors to handle those kind of problems currently we are just giving lots of pew shots examples to supervisor to determine the order of the Agent calling and I'd like to know is this are we on a right track or is there any other possibilities to improve so this is clearly the easiest way to deal with that the second thing that you can do is you can actually create A sequencing instruction in the supervisor so you're going to say you are the supervisor, wait the first or agent A to return a result before. Moving on to the next right so now what you will do is you're going to put this train to say but you have to be careful to not wait forever right so you have to say this is an async request, you're going to say I'm going to wait 25 seconds or Just put a hard limit and you're going to say you can look at how long it takes without going to execute like the P99, right, you're going to say, well, 99% of the time. This query executes in 7 seconds, okay, so I'm going to wait maybe 8 seconds and then I don't get a reply, I'm going to say, okay, I know to do this is an error I display or I return an error, but once I get a reply back right, I just take that reply, I wait 8 seconds, I take the reply back. And I'm going to say, okay, use this data and I'll show me an analysis of this data and use this context for it and you may have a CRM, you may have other data sources so now what you do is there are two agents, the supervisor is going to wait. 40 seconds in a particular order for this query, then you're going to sort of pass the result on to the second agent and the supervisor is going to put everything together. How do you do this right the way I do this, I just create an instruction set in my agent definition so this is agent-to-agent protocol A2A, right? So what I do is I say, okay, agent one that returns the data from the database will pass the data to agent two. Agent 2, at least in Google Gemini write in A2A, there is an async request, so agent 2 is going to say, okay, I'm waiting for you to finish. I computed time up and then I got the result, okay, I'm married the two and I pass it to the supervisor that's going to return back to the user after additional validation, right, so this is the way I do it with a two a. If you don't have A2A agent to agent protocol, what you need to do is you have to tell the supervisor for this question, you have to wait for agent one to finish before engaging with agent two, don't wait more than eight seconds or whatever value you have in P99. And then you're going to say if agent A has not returned an answer, 8 seconds returned this error message, I'm sorry, I cannot communicate with the state at this time until again later. And in this case you have to create some sort of mechanism to check back later so even though I'm going to wait the second slimeter again, there are multiple ways to do that right, is there any graph-based approach in order to The relationship between agents based on some kind of questions yeah yeah so what happens is this is the reasoning agent so this is why the reasoning is so important so if somebody is asking this question I'm going to have a reasoning agent that tells me oh, it looks like Agent A knows about this, agent B knows about that and they need to sort of work together so the reasoning agent will create this mapping. Between agent A and agent B for this question, if there's any question I'm going to look at my agent what is called an agent card so agent A has the card saying, here's what I can do, agent B has a card that says, here's what I can do and the more details you have to better the reasoning works. Right, and then the reasoning agent creates a sequence that gets passed on to the supervisor so instead of me having to hard code that I have a thinking model, a reasoning model that says, okay, execute this sequence agent day first way day seconds. Agent B second after that and then put it all together at this reasoning comes from my thinking, agent, I get it. You know and I'm not sure do you use thinking mode because in thinking mode the question I agree with you you should not hard code the relationship within the agent let the thinking agent know about that and the thinking agent actually is a simple instruction here are examples I can solve for If you look at the agent definition or the agent cards, can you establish a sequence between the agents that will give me this result and then the thinking agent will do that and what I typically do, just by the way. This sequence that returns what I'm looking for, I'm going to create a manifest every time a question similar to that comes, I don't need to reengage with it. Thinking I just say, oh, I know about that I'm going to go to my manifest and I'm going to execute those agents in order if anything is changed, I create a new manifest so my thinking agent actually only needs to be engaged when something new Gets asked that I don't know anything about yeah, I see so with some kind of approach like caching the history and like run the thinking agent when something unseen is coming. Exactly that's what I do, it's very, very successful, by the way, because I know and if you go through to companies like your clients, maybe they always ask the same questions, right, so the thinking is okay, I can solve for that. Very good questions by the way, there I love these questions yeah, so is there any other suggestions related by related with the original issue or otherwise I think we got a really grateful time to get advice from you and thank you for that. Can you send the temperature of the model to be zero it's already set zero so then I wanted to think about this in three ways right before generation what can you do which is prompting embedding fine tuning. And rag all these approaches are before generation right then you have during generation we just talked about the during generation, what you need to do and post generation so during generation You cannot do much I mean except for the prompting you cannot do much but post generation you have options so I want you to think about it seems to me that generation is good. What happens is the agents are just introducing randomness that you don't need that's what you need to address that's what I will do yeah get it. Yeah, so enjoy your I don't know what is your time zone is but anyway, like enjoy your night all day and it's like late a bug you're in Korea, right must be early yeah, it's about noon. I enjoy lunch, I love kimchi fried rice I also enjoy it thank you so much thank you so much good evening, good day choose 많은 수학이 있었다.