AI in Digital for Travel Marketers Webinar Recap
Watch the video then download our Data Privacy & Cookie Deprecation Whitepaper for Travel Marketers.
AI in Digital for Travel Marketers – Webinar Transcript
Freddy Chanut:
Hello, everyone, my name is Freddy. And today we’ll talk about AI for the travel market. More specifically. What we’re presenting is how to operationalise your AI generative AI content production efforts in the world of travel.
So the objective is by the end of this webinar, you’ll be able to understand how to move AI from an innovation to an execution process and framework. You’ll learn how to apply what you stand process to personalise your Gen. AI from a business perspective and really fast track your your production process through our learnings.
So a little bit about myself. My name is Frederic Chanut or Freddy. I am French, but I’m based in Sydney most of the time. Today you see me from London my favourite destination is in Samark and Uzbekistan. That’s amazing place in one of those fly over countries most people call but highly recommend if you’re a travel travel fan myself. Travel brag. I’ve been through about 61 countries and counting and drove a tiny car through about a third of the world from London to to Mongolia. And over the years I’ve raised about 20,000 for charity through my adventure travels.
So really, very much. When I’m not talking about travel, I am actually doing the doing the thing I’m selling. So I have about 20 years of experience. Across the travel industry. Worked at American express travel as my early years part of the expedia so part of company that was acquired by expedia and down digital strategy machine learning analytics. Some Cx.
I’m heavily involved in SEO, which I’ll be talking today. And I’ve worked with Dmos Otas, obviously working with with expedia airlines as well, and tour operators. Maybe that’s on the clients I’ve worked with so little bit of ads in my team.
So let’s talk a little bit about who we are and why we talking to you about travel. So In Marketing We Trust was built for Travel brands, actually, for where, before building In Marketing We Trust, I was at, as mentioned, part of a company, part of expedia and accommodation.com. And really I was fed up with having low quality, agency relationships and thought we could build a better model are really working with working with high class teams.
Really, that through 3 unique capabilities, leading digital expertise, unparalleled travel and tourism experience and with localised language and cultural capabilities, internal leading digital expertise.
We are a data-driven digital agency. First, we focus on being a global leader in data driven digital marketing. Or we work with some of the largest websites in the world, and that really equipped us with a lot of sophisticated tools and expertise that we built at the time, our proprietary crawler has crawled about 200 million Urls, collecting 20 billion rows of data to enable us to optimise conversions that has generated tens of millions in additional revenue for clients.
We optimised over 50,000 yeah, Urls for one travel client just using a proprietary platform called ScalePath. And this platform pretty happy about that, was voted one of the most innovative marketing technology of 2023 by the Australian Financial Review.
And last year, what’s very important to this webinar is, we did a bit about 17,000 pieces of accurate brand, safe and search engine, safe travel content, using a proprietary AI platform which we built on top of large language models in a way that safeguards commercial data.
So as mentioned, we live and breathe travel. We’ve worked with 37 different travel brands across 4 continents. Whether Ota’s tour operators, Meta airlines, results, destination insurance. I think we’ve done most of the most of the industry.
I’ve talked about having localised language and cultural capability. We have 54 team members based in key countries around the world. They speak 18 languages and understand local culture. And we’ve travelled and we have travel campaigns active in 30 countries right now.
AI in Digital for Travel Marketers – Webinar Recap
But today I wanted to before I roll down my my speech. I just wanted to be clear that this is not your usual presentation format. In a usual presentation format. You’ll have the vendor that will describe poor clients having a problem. Oh, my gosh! My client performance is decaying, my content performance is decaying. How can I possibly update all those old content piece. I am doomed. Please help!
Comes the hero with a very complex and comprehensive solution, and with its secret resource we built an amazing machine, produce 10,000 fifth content. You’ll see will be. You’ll be swept away by our brilliance, and how we brought about everything and see the results for yourself. We’ve got a job going up to the right and showing how clients have their best marketing live ever after with really good performance.
No, no, maybe it’s too early for me having coffee, but the reality is, we’ve all sat in those presentations, and we no one believes it not the presenter, not the audience.
Now, today, this is a candid view of what we learned along the way. So we’re sharing our experience. So you don’t have to go through the same pain, so hopefully, you’ll learn a bit more, and that will be a little bit more enjoyable for you to to watch.
So in terms of the content of this presentation context and the current application. This is short window of why we what we’ve done, why, we started and going through all the challenges we’ve we’ve overcome in rolling out, rolling out things at scale and summary of the key learnings.
So everyone has probably tried doing prompt engineering and correction. When you’re when you’re using chatgpt perplexity or others, and generally results are pretty good. It’s it works. You’re able to upgrade content. This is an example of some of the some of the content our team has worked on so repurposing basically existing existing work and turning that into a much better things we’ve used a mix of chat Gpt, a mid journey to create visuals and overall gen ai, I can make a few mistakes. But Qa typically helps that if you look closely at the some of the the visual on the bottom. Right, you’ll see that that poor chap working on that bench is missing a head.
So this is easily picked up when you have a human scanning through that while they, while they include generative AI tools through the process. But it’s not automatically picked up when you start rolling that at scale and not necessarily have a human behind every time, every piece of content that the machine is producing.
So we, in terms of the use case what we really wanted to use the AI solutions in large numbers models was to automate content, update. This is very important. You have content that decays over time. Fresh information is critical for search engines. To view your site as authoritative and relevant.
So this is one way to content repurpose a scale. So this is something that’s coming up a lot in the conversation around future of search. How do we repurpose a hold high value, copy into shorter content for conversion. For conversion, and that could be turning text into videos, or vice versa.
The other other important factor in travel is, how do we localise any content? In the relevant language for our source? Our source countries really working at? How would build the content typically in English or a couple of the languages. And how do I make sure that the people in in France, in Vietnam, in Thailand, in other countries where we don’t necessarily have lot of language skills can can also be served high quality content that is normally in the language with cultural relevance.
And then, last, but not least is really. Let’s can we test what work and what doesn’t? If can reduce the cost of production, we should be able to really work things out. See what works, what doesn’t, and how do I do that without necessarily having the product or engineering team support through all things? So in this presentation and this context.
We’ve had a conversation with one of our large travel clients. And they said: Hey, we’ve produced a whole raft of content pieces that are quite updated. So as everyone knows, Covid heats and no one really spent time updating content. So a lot of the information became outdated for those view that are living in Sydney. I wish we could still have a coffee for $3 90. That’s not the case anymore. And so this was a typical piece of content we wanted to to update. It’s not necessarily the highest highest in the order of convert conversion, but this is something that if you keep on having old Content on your site, then you’ll start dropping in relevance for Google and other search engines.
So we wanted to really do see, start with something simple, factual content, which is very important. How do we ground something that is actually useful? It is. And things like, we started with just flights to country. How can I find cheap flights to a country, when it’s best time to book? What are the top cities and the best time to go, top picks to do and see in country. So again, fairly basic.
We’ve been to countless sites that have automated content that really doesn’t have value. One of the really big thing was we wanted that column to be useful, to avoid being penalised by Google and other search engines for really low quality, generative AI content.
But we also wanted to ask a few questions with that part of the project is it’s a business impact. Is the new content helping, ranking and traffic if it doesn’t well, that’s that’s not great, because we’re investing in something that doesn’t deliver results. Is it cheaper than you meant yet better than the templated content. If if we can’t, we can’t control the quality. Or it’s end up being too expensive, then we’re probably better be better off by having content that is just produced by machine is really sort of automated feel the gap type of type of content.
Does it work in all languages? And can we actually scale without too much working? Or and can we actually scale without too much human interaction? of the success criteria so that we can create new content and not just a rehash of content that existed. It doesn’t require engineering resources to implement and not just yet, at least for pilot projects.
We can use third-party data to match information. This was important, because like many like many companies, it’s often easier to pay for and acquire third party data than to obtain data that is stoking some operational silos within your organisation. that we can create consistency of content. And so that matching a brand and full of brand guidelines, internal voice without being robotic. What we don’t want is the same sort of style of content that looks horrible. That sounds horrible that reads poorly, that we’ve seen all of the automated content that you see across all the most of the otas metas and and inventory-driven sites.
Big thing for big thing for us was clients team is a small team. So how do we limit the Qa effort? Someone’s going to review this, someone’s going to have to check and and validate. And we really want to make to make that as as little as possible.
The last thing is obviously, can we produce on time delivery? I’ll get to that later before. Yes, of course, I will just be punching a few, a few prompts, and the machine will produce. The content turns out not to be so. Our approach was again focus on a subset of pages. Flights to country. It iterates based on performance data. If it works, we can expand and we can grow smaller. Let’s we’ve managed to get it to get it right. And we, we have expanded since.
And so that we can roll out to more, template, more pages and increasing complexity, and move towards more of a conversion focused contents.
So first of all, before we even jumped into building on product sets and things like that we just looked at. There was. There is such a rut in the industry that we looked at 4 options all right. Well when we looked at the Jasper or AI and all the and all the others what we found it was either too expensive. They were missing languages. The output was very messy, it was still very manual, lacked a lot of features, and also the initial tests were underwhelming. When we want to start automating things again, a lot of those tools are basically a thin Ui layer on top of open AI functions. and they’re not. They are general general purpose built so they they can accommodate someone in ecommerce, someone in fashion in B 2 B in travel and everything, and so often the quality isn’t isn’t fantastic.
So we looked at that, and we’re going all right. Well, let’s see if we can build that our initial assumption on where gen AI fits in terms of quality when you would deliver scale but at a much faster speed than a programmatic content. That is state static or a programmatic content that offer dynamic information.
It’s better than machine machine translation in terms of how it reads, if you’ve ever tried to translate something in Japanese and then take another translator and translate that back, you end up with quite a funny sounding. A funny sounding answer.
We knew that it would not be us, but it would not be perfect, so I’ll be probably the quality would be probably worse. Or let’s say, there’ll be a bit more predictable structured robotic than what a localisation agency could produce. But again, we’ll be able to scale that faster and at at a fraction of the cost.
We certainly weren’t expecting to produce prose like a freelance copywriter or freelance journalist. So this was more. Let’s give that factual. Let’s keep that simple. And it would work for us. We thought high value pages. Really, the piece that are around travel, planning, dreaming, base planning phase, and really high conversion items. You really want to have that by a human copy writer.
Today you cannot change. You cannot ask any of the AI tools to produce something that will really nail it for conversion and really be convey emotions the way a human could write with travelistism and person that been been there, to the place, to the, to the experience.
What we thought could work is for that what we call the torso, which is queries that are valuable, but they are already too big and at scale that we’re not. We’re not able to spend the resources to have a human write, the content for.
And that’s basically to displace all of the programmatic content that used to be in there where programmatic content is just fill the gaps. Type of content, automated content programmatic Content is fantastic. If you have got that, then it’s constant refresh. So the average prices for flights in the last, in the last months, or in the last week, the best price for base price for hotels for the next for the next months those sorts of things you really want to automate rather than have to necessarily use a human or or even like a large language model.
So decided to be able to sell as Jeremy Clarks and Mitel would say, how hard can it be? But let’s just connect to Gpt. 3 GP. 4. Back. Then I’ll connect that to the environment. Be the right list of pages and a few prompts in the right language, and we’ll create the white prompt with some existing example, turn a voice and language. recruit a copyproofing team to validated content and basically package it in a format. We can easily upload and Job done. Well. Nope didn’t quite work like that.
So let’s talk a little bit about what we learned along the way. Excuse me. Second. Initially, this was your approach. and we use what we call a 0 shot shot approach to be on the first version of all our writing books. Something like open AI! tools! And Api has built on pretty much the entire web, and more so. They should have all the information in the world, and they should be able to produce it.
What’s 0 shots in the in the environment? So as you can see. So you look at this beautiful graph, sorry for the, for the the general look and feel of my like. Machine learning presentations. If you look at this, your shots. which is the things that everyone does in chat gpt, you describe the task in in a few in a few words. Basically, you don’t give any any context, and things like that in a in a window. You just go, hey? Write me! Write me! The top things to do in Uzbekistan, or where places to visit there really, things that you use is prompt engineering to tweak and turn, and this is where a lot of the hype is at the moment.
This is good, but in terms of the quality and the task accuracy, it’s probably the lowest of all the all the approach. Then you go typically to a few short prompt, which is, basically you give examples of solving the task into. I want you to be a travel writer, and this is the sort of content I want you to produce, and you make reference for reference to content that already exist, whether fetch by by web waves, scrapers, or directly in your in your system.
Then we talk about retrieval, augmentation prompting, which I’ll talk a bit later. That’s end of what we’re doing. And then the last bit, the fine-tuning for custom models. So we started with your short, prompt, and we’ve gone. Okay, let’s see how, if it gets us to 80% quality we might be, we might be sweet and really in terms of structure and how it looked. It was very simple.
We input the list of country in a Csv file into a Gpt 3 pipeline into the open AI Api. We generated the prompt based on topics. Writing styles. Turn a voice. And then basically, we looked at the output of the content. From the from the Api, and basically played a little bit with the parameters in the GP. 3 model. Yes, that was back then.
So that really helped get get something. But what we ended up with is a little few weird things, so tough things to see and do in Australia. Canada is a vast country whose diverse region I know. Canada and Australia are often sided together in the 5 Eyes, but not necessarily me and the tourism. It’s a bit far away from each other.
We then best city toast in Saudi Arabia, with Toronto making making the list in the top of the list again, not sure how that came about, and then, when it comes to turn a voice in style as much as I’m I love sailing pirates and everything. A hoy adventures, traveler. Well, that wasn’t quite on brand of what we wanted to do. We were still representing established travel business, so it didn’t feel quite right to to talk about to talk that way to to audience.
So all of that really led to a lot of contents. That that was just took an enormous amount of time to review and correct. And so that really was frustrating. And we realised that it wasn’t the right way of producing the content even with when we were doing the fine tuning that was not that was not working so too much variations.
So we then applied what we call retrieval augmented generation. Let’s see if it works better.
The idea behind it is when we ask a prompt or we ask a question, it will first hit the knowledge base of what has then fetched the relevant knowledge and then push that into the last language, model the large language model, having all of that corpora of information and really clear detailed answers. So to avoid having Toronto as Top city in Saudi, Arabia, Canada, and Australia helped really push the right at the right answers and so overall that worked a little bit better.
I’ll go into what the tools we use. And that very much was, we then put the Pdf. The text typically of how the the content we wanted to use either as brand guidelines, but also as the information. So again, which cities getting the list of cities, top cities per country, getting the the top attractions, the weather patterns, travel, travel data, so on and so forth into a text later so that we can really create the sort of data we need to be retrieved at the right moment.
Then, basically that text along with a query. So the prompt will be will be creating an embedding generator, so that enable us to then push that into the database that will be stored whenever we we ask the the prompt, and so in the prompt will then have the query and the context of that, and that information will be fetched with a relevant, with a relevant documents to that query. So then we can that push that into the model and that that speeds up the answer. So this is in nutshell. How it works.
The good news is it worked all sort of so in terms of. So the intim of the content produced again something simple. We wanted really to make it to make it work. So we wanted to refresh simple contents. And in terms of the business impact answers, so did the new content help breaking in traffic when we pushed it live. After, I think about 4 weeks, 46 weeks, we had a 30%, nearly 30% uplift for us traffic so pretty happy about that us being a competitive market.
It is cheaper than humans, and it’s certainly better than template content. We work in all languages. Well, we still had issues with Korea and Japan. So any I found that Korea in Japan and then traditional and simplified Chinese tend to be a bit tricky. Type every now and again as well. That’s that they they know. They’re not trouble makers in the localization world, so it can be worked on and the scale without too much interaction. That certainly felt like a work in progress.
Tell you a bit more about that but it was working well, not so much. The thing is the model we use the process we’ve used. We ended up creating a whole raft of Jupyter notebook and contents, and this was great when we were really thrashing and dry testing and and doing different things. But this is really started to create a lot of confusion in what? What’s the right?
What’s right? Notebook we’re using? What is the right documents we’re working on. And that really started to get messy early on again. We started from something very basic. And that was the discovery of that and can’t. And it really needed a very messy and manual update and version control and for those of you that played a little bit around with our scripts and and others.
Jupiter notebooks are not a production tool. So this would. This is very much the tool used by your data. And at least in data. I have data scientists to do some analysis. And basically, we used a very lightweight framework to query data into a production machine so that started to to break and cause issues.
One of the challenges is quickly we struggle to get on brand. So this is some example of the feedback we received from more clients where basically, there were a few issues with the headings were started to to pop up as sentences instead of questions. There’s too many exclamation marks we had takes that what you generate sometimes, and some of variations.
And certainly one thing that we’ve learnt is Openai has trained their model on travel data that use the word stunning, nice, vibrant a lot, and that really started to sound terrible. My favourite is, we started having the Venice of absolutely every place and everything. As soon as there was more than 2 rivers in. In a city.
So it’s so we ended up with something we didn’t want, which was a default easily to generate, to, to generate traffic without given location, really, basically an helpful type of content.
The other thing is when we move to the the product that when we started producing, we really saw that it was a bit too verbose and we really started to see a lot of challenges with with that where basically. It started to be just the sort of content you see and read everywhere. And now you take this type of content in English and apply that to to Italian, or to start localising the language, and that becomes absolutely terrible. So this is sort of when we.
This is a illustration that my Italian project manager, saying, it’s like this is really sort of all the imagery all the the pas se and boring imagery that you have of Italy, and things like that started to come up, and that’s more or less impression of his face when he started to review the review, the content on the on the bait. So really, what we ended up with is having a tone of voice format and the style and information level issues. So really, that started to creep in being a real problem, especially when not we don’t have all the language capabilities in our team, and and especially reviewing thousands of piece of content, is very can be a tedious work, and for those that are not familiar with it it can be difficult to to master.
So really we were relying on external editors that were recruited to help us do that, and we ended up having too many back and forth and styling guidelines things like that to the point where really we ended up having to build an interface designed to really pick up the pick up the issues or the whenever the content came about and really show. Is it? Is it good? Is it bad? And any comments and fixes that we could we could produce?
And so really, this is the this is the sort of things that that we had. So what caused the issues is messy bad data forces to refresh the content a lot. And so you imagine going back through the full cycle of having to to go back. Which content this is bad. Okay, refresh that we and see if it. If if it has updated.
We had quite a lot of tech limitations. That stopped us from having production runs so kicking or Apis Api being Api limits, Api being unstable and a few things breaking and that really meant a long, costly, infrastructure process. Imagine you go. Yep, we should be on time. Oh, no, we’re not on time, because we found a better batch of content that needed to rework, which has been frustrating on for us and the clients as well.
And so we ended up building our interface for humanity, just to review and approve the contents in a far more streamlined manner.
So all of that, all of those learnings, we turn that into a into a process that really sort of is fairly simple. Which is have a template, define the templates you want to use. Look at the data you want to collect and make sure that data is clean. Which is which is easier said than done.
Then work on your product development. So tweak the tweak, the prompt ingest, the data., have the right context to produce the produce, the content. And then really, once you start having the batch of content that is being created. Work on the quality standards. Make sure you align with your clients. Make sure you align with your editorial team about. What is the right level? Is it good, is is it? Is it a, is it? An error? Is it a styling, styling thing that you’d like? Is it a? Is it an opinion? Is it a? Is it a preference? Those things matter because you will have a lot of different people will have a different interpretation of what good quality content means especially when you have the difference between what? A data engineer thinks the quality is versus a content manager versus a copywriter.
So in terms of template definition, really define the number of sections, what data is available and likely to be used. What’s the content, sample the feedback from client team and really approve? Have a final yep. We are good final proposal and validations. Once we have that we go. This is the content we’re going to produce. And this is what it sounds like.
So an example for that is, we started to share the flights to country, how to in in that in that particular format, so that make it easy to to see what’s going to to do? And we discuss, hey? Can we add the visa questions? Is it is it a good idea? Is it not a good idea, then, in the data collection opt in the reliable data for each and every topic. Build your database and run the data, validation and manual refinement.
I’ll get into that in a little bit about some of the learnings around really validate, because it’s not necessarily that it’s bad data. Some of the data may not be, is not. The is not the information you want to surface. you will find. Here’s here’s a cheap, and what we found is when we look at the top destinations, top hotels, top flight. We always have plenty of information, have a look at unknown countries, have a look at travel, Turkmenistan, or or countries that you’re not travelling very often and see what information you have available, because you might not be able to use the statements, or the system will will hallucinate. So have backup more default, generate content that you can create so almost like grammatically.
Then, after that select and adapt the relevant model, as I mentioned. Set up the run guidelines and data retrieval for that consistency. Okay, well, I’m going to fetch this particular information. About the weather from this segment of the database and really define and testing. This is this is the heavy work that someone heads down, play with a prompt and work on the different agents to control the quality, revise output in fine tunes. Give enough time for that. The more you’ll get that right the less you have to do in Qa.
Then, once you generate the first batch of content authority it did to identify issues, to finish, define the list of issues again. What’s how much it affects the content quality, and whether that is possible or not, a small change in a small change in tone and style can mean that the system respond very weirdly. It’s surprising how a small tweak, for example, for us, going from gendered, gendered content in romance language so French, Italian. Portuguese, Spanish. To a non-gendered means that the common quality really dropped, and that really caused a lot of issues in some of those language.
Those guidelines are critical, and they’re really are going to be a key to your success, because there people will have different views, and they will have a misalignment on quality standard within the stakeholders, and that’s important, if you for your upward reporting as well. So they don’t think that you’re going to write like for farmers or lonely planets.
And then, once you’ve done that go through full content generation continuity based on the guidelines, continuous feedback on the client with the normal clients and to go. This is what we’re doing. This is this is what we’re seeing as well and tweak that and make sure if you’re working with editors paste paste their review efforts. What we found is, you can only look at so many pieces of content that are pretty much the same, or sort of are about the same topic for so long, and what we found is the quality standard tend to drop if they work over an extended period of time, even if they are, if they’re pros at it.
Way to quote your progress, give them feedback review. Have, have a second editor potentially spot, check their work, and then obviously give the give the client final opportunity to review. So this was a little light process. But I can assure you that that sort of work, that sort of process has been built of a many, many stumbles that we’ve we’ve learnt.
So the key learnings and I realise that I’m probably a little tight on time. Well, that’s a bit of time, for Qa is what we’ve learnt is we’ve gone through the 5 stages of trauma. What 40 would be is, take visibility, quality standards, and performance. So really is the is it the best model?
What it was for us is very much alignment on expectations and quality. What we’ve what we thought a good content was for a gen ai versus what a a copywriter or an editor for was very different. It’s change management. It’s human.
Bring people on board around that some people in your organisations are going to feel threatened by having content that is generated by a machine, and especially a machine that is very much a black box. It’s very hard to understand why a machine did a certain hallucination. So that can be that can be difficult. Also, if you’re working with a lot of content folks, they might see it as a replacement for their jobs, so they might feel threatened that their livelihood is on the line and really change management processes and project management is really key around the communications. So really that what it was more than more than anything we thought early on.
Oh, gosh! Clients is asking for the prompt we’re using. We really don’t want to give it away. Because that’s so. IP, we thought, Oh, that’s that’s 70 80% of of the of the magic source turns out, not at all. I think we estimate today that the prompt itself is about 20% of the problem.
The main thing with that is what is going to be the real difference between a good quality content that you won’t spend time as one hours reviewing and and the 0 shot content that we’ve we’ve tried first with toronto being a top city in Saudi. Arabia is having the right grounding and the right rag or retrieval generation. So basically make sure that your model is using the right data.
Something that is that you can easily forget is larger model are probabilistic. And that means that generate output based on what? They’ll observe in the training data. What is the logical next? Words coming up so they don’t have a concept of? Is it true? Is it relevant? Is, does it make sense? Let’s just go. Hey? That word after after word, a sounds sounds really good because everyone is using the same.
So you end up with potentially some real garbage with a content produced and no idea why it’s that so gem of artificial intelligence. It’s intelligent in a way that produce nice sounding content. But it doesn’t necessarily means that it’s true.
Align the policies and qualities what we thought our price. Expectation was, it’s a bit better than programmatic, but it’s nowhere near as copywriter. But when people, when you present the answer, people will still look at it and match it against, the quite standard copywriter or journalist or travel writer. So again. Really build a roadmap around what quality, what we can work towards quality. But be sure that you go back and go. This is this is critical. This conent cannot go out because that they are issues. But this is styling preferences that even if it’s not amazing, we’ll get we’ll get there.
So sorry if I if I sound very boring on that. But it’s super important for us, because that really sort of made the difference, especially if you’re going to communicate that with folks in your senior management that are not going to be involved. They they all have used turgic to the old thing. The content is amazing. There is a big difference there.
The the last one another one is at the moment you’re probably working in isolation. You’re doing your innovation in your corner and and working with it. Don’t forget to involve your product team. Give them in the look. You don’t have to add them, giving you advice on exactly how to run your your project, but having them in lunch and learn sessions or updates and regular updates on how it progresses and the learnings actually with the right product priority managers. I found that to be really valuable because you get them on site very early and really help them to to make sure that they don’t have to over. They don’t have to go through all the stumbling that you may have during your innovation, innovation phase, or your design thinking and sort of lead started phase, if you prefer documents, and that’s probably the biggest thing. It’s document the the errors. Most of the team wants to do most most of the time. The team wants to talk about what works and how to do it.
Make sure that you ask your team to document what went wrong, what you and the logic behind that, because people might have to retrace the State.
Another big learning. Our team is distributed around the world. So we’ve got Ricardo in Italy, on the left river, based in Saskoret, based based in Melbourne and Anna, based on the right based in Portugal. And you can see me in the background, correcting some text based in Sydney.
That was very hard to do that to do the work when you’re not located. If you can get your team together to bring the during the production process, or even version 2. I highly recommend it’s it made a world of difference to be sitting at the same table and help travel should. There are times where we was just production, heavy. And really, we’re all busy with all things. But it’s simple, just jot around the just go over over someone’s someone’s shoulder. To look at the results means that our data scientists understood what meant our people in charge of aging guidelines meant that they understand understood how the prompts were generating, so on and so forth, really important in, if you can do that.
Be prepared to have more problems and more questions, for every more changes coming up for every question into. And this is, oh, if we generate a prompt, if we do that? Do we do that in English and then translate? Do we do that in language? Okay, if we do that in language that’s great. But we need dedicated resource to create the prompt and validates that the input is correct versus the output being wrong whereas if we’re judged in English, it’s faster. But it has extra step in the process. It’s it as an opportunity also for the system to create errors. Again, expect some of that.
I’ve alluded to data. Clean data is critical. You’ve probably heard the garbage in garbage out. Still, very. It holds very true. Here’s an example of all the data codes for airports, and you might have noticed that there is actually a Disneyland Paris airport that they would be used.
But the but Disney wanted their own efforts when they when they started part. So you might end up with technically, the data is correct, but it doesn’t make sense.
The other thing is you can see the one below Donetsk, not probably the one you want. I don’t know if you want to fly, that the month last, but not least. Sometimes the data is correct, but it could be very missteped. So that’s an example that we, in the French language we picked up with flights from Canada to to Bulgaria, and what it was suggesting is for people to take Lufthansic cargoes so that that also came up with some European source country and African destinations.
So we looked at it and gone. There’s nothing wrong with the data. What happened is, we had some of the cargo data finding their way into into information, but the output could be a very poorly misunderstanded. Imagine the Guardian telling that pretensions on our travel agency or Meta search or travel business recommends their their African passengers to travel cargo.
Really, again could be very misinterpreted with that so highly recommend you. You look at the input and what it means. This came up because there was just very few floods between Canada and Bulgaria. But again, that’s the sort of things that could really trip you up.
The type of data, the basically the ranking. Potentially, you might have to build specific scripts or variations based on the type of categories. You go into type of destination. So Russia, Syria, Ukrainian men at the moment are in the in definitely in the No go zone, and others as well. So. It takes a bit of effort to start with that and really classify. But the good news is it’s built over time. So once you set regular review period the effort to maintain that sort of data which may exist somewhere on your in your environment. Really will pay off in terms of cure.
One of the things is, we keep on having issues with testing different models. And so really choose a stable model. First we ended up. We tried lama. We tried up. I can things like that. And it was just a little too youth. And that came right in the the explosion of all the models that that were appearing. So we ended up with open AI, because it was the most stable of the bunch.
Remind everyone that Openai Api or an Api for large ones. Model, is not your chat Gpt your chat window. and can be fine tune with human intervention. It also have a lot of guardrails built into it. An Api doesn’t have that. So consider that whatever your query, the Api and the answer must be correct, some of the first results. So that really that really has an impact in terms of the yeah, the performance expectations of the model.
You’ll probably you basically probably send too many requests or things like that. Things have gone a lot better. But I consider throttling your request. The screenshot you seeing is the team talking about all the problems eating 5 500 errors Api eras today. All the time Api needs you can get the Enterprise versions great, because that’s going to save you a lot of time. If not, just hack your way into it by having multiple accounts, multiple accounts and Api keys.
Sometimes using the right technology may be the wrong answer. So we’ve used vector database to to do the retrieval augment generation. At first. The problem is, it’s quite a new technology. Very few people know how to use it, and it can be difficult. So think about that.
You’ll iterations in producing that content as a long way from your laptop to your production so potentially, you may want to. You don’t necessarily need a database for AI or something for AI. At first you need a database.
You really want to imagine real world requirements and resources that you have versus what everyone is hyping up about and so make sure that when you pick a technology, it’s the sort of learning curve that you roll out for the organisation that will pay out year after year. So not something. We’re trying that cool, new, shiny toy versus we really learn more. People learn how to query a database effectively if you can build tools and process to make Qa as easy as possible. This was probably your biggest learning is the the moment you start involving more people, things get really messy very quickly.
So the more you do upstream that you do work to manage the quality output the less you’ll need in Qa. That simple review process and publishing process to account for this production method and how you might need to do reruns.
One of the things we we’ve put in now is having grounding or having checks done by another another machine to produce the, to validate, that it has factual accuracy. It we wasted a lot of time in the in that. In that step and really make sure that you triple the amount of effort you’d have for a human until you fully sized but consistent output.
First you do have in the models fine tuning options to build feedback group really recommended. You take on the feedback from your Qa. Team so that it gets better every time and the last one is every time you do that which will cause blowout with all the explorers and took and utilizations.
So in nutshell, in terms of organisations, human issues bring everyone aboard early. Not everyone’s thrilled. Most people are cautious. Give the tech and a product team on your side. Simple updates, lunch and learns will work well. Document about, project the tech, the processes, the rituals, the issues on the quality side manage expectations again, how better than it’s better than programmatic? But it’s not as good as humans.
The prompt and the models are probably 20% or less of the solutions. And a small change, a small interface change can have big issue with with the output. So be ready to be flexible on that.
The more and obviously, the more you structure the content for consistency, the more robust it will sound. But the more it created you you have it and the more piece time you’ll need to allocate to do that and then test, test, test and ready to fall back on your simpler, content. Internal voice in terms of tech issues. Create your own custom model your own custom token it will. It will make a difference in terms of avoiding having the the very generate blurb, generate, content, very the variable quality that we’ve seen with GPT or the Apis.
The tech is unstable. Plan. Plan. New production runs for outage, delays and and errors and it’s evolving fast. One of the recommendation is, have your rules and models sit not directly plug into the into the into the model, but have an abstraction layer to avoid the, to avoid the risk of being locked in with one technology, which is the issue with a lot of the the Ui content. Tools that we’re seeing today tells us instructions, moles and tools today. Quote 3 is probably growth. 3. Office is one of the best ones. Open eyes, probably the most popular, but they are ones that are coming out that are really strong as well and watch for again taking unstable, even a small scale increase. Probably we’ve we’ve seen that from 1020 pieces started to cause issue with tech and reliability data quality as boring as it is.
Feed your models better data and not better prompt work on your exclusion. Specific case scenario, have rollback fallback plans potentially treat certain aspects of your content differently based on destinations, products you serve and scenarios.
Make sure you recruit your team of humans, have the right people, train them, learn, listen, listen, listen to their feedback to see what works, budget and plan for bad output in the in the early stages.
And that was me.
Hopefully, you’ve I haven’t overwhelmed you and given you a few things that we’ve learned along the way so you can. You can work on that.