Harnessing Generative AI For Creating Educational Content With Illumidesk

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Introducing Rudder Stack Profiles.

Rudder Stack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable enriched data to every downstream team.

You specify the customer traits, then profiles runs the joints and computations for you to create complete customer profiles.

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack.

You shouldn't have to throw away the database to build with fast changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades old batch computation model for an efficient incremental engine to get complex queries that are always up to date. With Materialise, you can. It's the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Whether it's real time dashboarding and analytics, personalization and segmentation,

or automation and alerting, Materialise gives you the ability to work with fresh, correct, and scalable results, all in a familiar SQL interface.

Go to data engineering podcast.com/materialize

today to get 2 weeks free.

Your host is Tobias Macy, and today I'm interviewing Greg Werner about building Illumadesk, a data driven and AI powered online learning platform. So, Greg, can you start by introducing yourself?

Sure. My name is Greg Werner, as you said.

Not Greg Werner,

but Greg Werner. And,

I am the cofounder of AlumiDesk,

and

we are based out of Atlanta and Utah mostly, and we have a small team in the Ukraine as well. And do you remember how you first got started working in data?

I do. So my previous,

venture was

very tightly focused on,

e invoicing for

banks

on

the sell side of things,

and

their

transaction volume was extremely high, particularly at the end of the month.

And on the supply side or buy side of things, we integrated supply chains.

So that's when I from a at a professional level, when I really started to get involved with data engineering because of the massive amounts of data that we had to deal with

in short time spans.

So those those documents were structured,

as XML documents and had a specific schema.

So we had to convert that,

XML document, store it in a database,

put it in a data lake,

process documents asynchronously

and that kind of thing.

Nothing nothing exciting, but it just had to work, and it had to be performing. And any second of performance advantage that we could get there was

was not only extremely cost beneficial for us, but also,

put a big smile on our customers' faces.

So now with the Luminessk,

it's

the data

part of it is

really interesting because of the generative

AI and the embedding

models that we have to interact with

and how the content

is loaded

and split and stored into vector databases and then how we retrieve that data to improve the model's context

in order to create

these wonderful courses that we're building with AI.

And sort of as a side note in grad school,

I did take

a few data science classes, and that's kind of where

I also got exposed

to data and how

quickly I understood that garbage in, garbage out. So I I understood that the j data engineering pipeline

was a must have for any data science endeavor to work. So,

that also got me into understanding

the

intricacies of, you know, relational database, NoSQL databases, and pipelines, and streaming, and all that stuff. So if that stuff doesn't work, obviously, none of

none of the the the toys that we're all talking about today would you know, those wouldn't work either.

And so in terms of the Illumadesk

project, you mentioned a little bit about using

AI for content generation. I'm wondering if you can just give an overview of

the focus of the project and what it is about this problem space that is

making it worth your while to invest the time and energy into building it.

Yeah. So the first idea for this, started a few years ago, and

the problem space that we were trying to deal with was

helping instructors and content managers save time.

When when building and

distributing their content or surfacing their content to their learners.

And, ultimately,

the goal for any instructor or any organization that has instructors and content managers is is to improve learning outcomes,

which lead to

improved productivity at work,

a better learner experience, you know, especially in,

the realms of k through 12 or higher ed. They maybe have a a more traditional approach to assessing

student outcomes, but, you know, the the student should still

still be happy, and there's still competition between different colleges and universities to attract and retain the best talent, if you will. But the problem space we were dealing with was,

that the instructor is also the content manager in many,

situations,

which sort of reminds me of when you're a data

scientist, you

you you, you know, you sort of have to dabble into the data engineering aspects

of your project setup.

Or if you're a front end developer, you also have to dabble into deploying your application to something like, you know, Versal,

or Netlify or something like that because

it's just really hard to have

a budget that would

allow you to recruit and retain a content manager and then also recruit and retain an instructor. And a lot of companies just start with

the instructor, maybe even as a third party consultant.

So,

what we're trying to do is help that persona

use templates and AI to create high quality courses

that they can then deliver to their students

and then also manage that course,

during the course of time

so that it can have up to date data and relevant data,

that they can, you know, train their learners with. I hope that made sense. Yeah. Absolutely. And so with this idea of

generative AI,

using it in an educational

context, I like your point of using templates for

managing the structure of that. And for

educators and content creators, I'm wondering if you can talk to some of the challenges that they face in being able to develop and maintain the course materials and content for their target audiences,

and in particular, being able to adapt that content to learners, particularly if they might be at different,

stages of sophistication

for the subject matter at hand?

Yeah. So the

there are several standards in the industry. Some are focused on specific verticals,

like, the higher ed or k to 12 vertical has a standard called learning tools interoperability

or LTI,

which is

a way for the platform or the learning management system to interact with external tools,

such as a plagiarism detector,

a course builder like ourselves, etcetera. And then in the private sector,

there's other standards like,

SCORM, and then there's different versions of SCORM because that standard's been around for a while. There's xAPI

among others. So to answer your question, I think part of the issue that people have or content managers and instructors have when

developing courses

is that it's hard for them to

create

the content and distribute it in a format that's compatible

with the formats that everybody else needs

to surface it to their

own learners. For example, if I'm a content manager

and I have a training agency that focuses,

specifically on the developing content,

and you happen to use PowerPoints

only for, teaching your courses, and then someone else may use,

you know, Google Docs or Word documents. Someone else may use Notion or something more interactive or headless CMS.

So, you know, the the possibilities are somewhat endless.

So it's,

for us, the way we thought about it is that we had to develop sort of a canonical

internal

document schema,

structured as a JSON that would allow us to inject content

into different places within

that schema, whether it be manual or with the generative AI, and then just and then use standard

transformations

tools

to

export from our canonical schema to other, you know, to other formats,

whether that be

a package format like SCORM or xAPI that I mentioned or specific documents,

in a folder collection, like, you know, Google Docs or Word Docs or or something like that. So,

and and then on the importing side of things, it works the same way. So we can retrieve data from different data sources and inject it into our

internal,

canonical document format

so that we can obtain

or fetch,

content from different,

sources, including directly from the generative AI models.

So that's that's kind of the way we we approach this this problem space.

And generative AI is

gaining everybody's attention right now because of the kind of headline capabilities

that people are touting of, oh, I can ask it this question, and then it will answer in a manner that is comprehensible and

for the most part, generally accurate.

But for the case of educational material,

you typically need to go beyond just generally accurate, and you need to have some validation

and confidence in the content that it's producing. Because

as an educator or as a content creator, you are representing yourself through this educational material. So you want people to have confidence that what you're saying is factual and correct,

and accurate. And I'm wondering if you can talk to some of the ways that,

in your platform and in the workflow of these content creators and content managers,

how they go through that process of validating the output from the AI models as well as reducing the burden on them to double check everything that is being produced.

Yeah. So I think that's even more relevant

in verticals that are

extremely,

strict with compliance requirements.

So something like health care,

for example, that content better be accurate, and it better cite all of the sources or you're in big trouble.

So, you know, for other,

verticals or use cases, maybe it's not so strict,

but it's still extremely relevant. So what we've done,

is

I'm sure you're,

you're familiar with the term of hallucinations and things like that that generative AI is famous for.

So you wanna avoid that wherever possible, at least detect

when

the generative AI is providing you with inaccurate results. So I think I think we can parse this out parse the question out into 2

spaces.

The first 1 is, how do we enter how do we send the request to the generative AI so that

we can limit or reduce the risk

that there will be hallucinations

or inaccurate data returned back to us as the gen with the generative AI's output. And then, also,

how do we validate or evaluate

the quality

of those outputs,

including, as you mentioned, how do how you know, where can we automate those things? Because it can be a very tedious task.

So I'll start with the context part.

And what we do here, there's a lot of experimentation that takes place,

also known as,

prompt engineering,

where the user has to instruct

the generative AI model

how it should behave

by identifying

perhaps the persona that the generative

AI model should should emulate,

provide the generative AI model with additional context. For example, saying, I'm Tobias Macy, and I

have a data engineering podcast, and I would like to ask some concrete questions about x y z. That's a much better

way for the generative AI

to understand

or guide the generative AI into providing more relevant responses to to you.

And then there's also,

from a data engineering,

standpoint,

there's also ways of

structuring

the request

to the generative AI where you can, instruct it to respond

in specific formats or schemas.

For example,

if you're asking,

the generative AI to inject content

in certain placeholders within a template, for example, a layout template, you may need to

instruct the generative AI to respond with a specific JSON schema

that in that could include,

for example,

a paragraph key with the text in that paragraph key.

And then you can have another key, like the code key, and it could

add a content within the code code key

instead of,

for example or in addition to, for example,

instructing the generative AI to respond in markdown format. So you could have the same headers and and code blocks in the response, but structured

and markdown instead of with a JSON format. So those are a couple of ways that you can

guide the the AI to give you better responses and also to structure them in a way

that your own systems can

parse out the results and then import them into into the course. And then from a data engineering,

standpoint,

1 of the big surprises for us was

being able to

structure the format in such a way that it's more relevant for the use case,

a learning use case, if you will. So as you know, there's learners learn in different ways, and 1 of those ways could be, for example, with a a question and answer chatbot, which everybody's familiar with these days, I'm sure. But

how you interact with a knowledge base, for example,

with a question and answer format is very different from

having the generative AI

feed you content

in a free flow,

form of text. Because

if you store all of your data

or your,

I and I'll get that into I'll get that into that in a second, how you how we store the data. But how we store the data, if you have a JSON schema that has

an email, for example,

and you feed it with generative AI, you say, okay. How do I want this email to look in a question and answer format? So you would say something like,

who's who is the author? Question mark. And then the response would be, the author is Tobias Macy. And you say, okay. What is the content or the main ideas of this email

that would you you the AI could,

parse out the the email and provide you with the summary or the main idea of each paragraph in the email. And then you can store that complete result in the question and answer format so that when you do use

something like a chatbot or a q and a chatbot,

it's it's

accessing the same content,

but it's structured in a way that it's better handled by the q and a format

in in that in that chatbot interface.

So, there's different ways to store the content as as well when you're retrieving it for the purposes of of, you know, creating learning environments.

Should I stop there?

I I think that's a good spot to tee off into the next thing I was gonna ask about. And with that aspect of the learning environment and the fact that you are building a generalized platform

to allow people to educate or teach people across different problem domains, it also brings up the question of

how do you allow them to provide their own contextual cues to the AI to ensure that the content that is being produced is relevant to the topic that they are trying to address. And I'm wondering if you can talk to some of the data integrations or ways that the people who are using your platform to build this content are able to bring their own information

to

populate those contextual

aspects and and bring either bring their own vector DB or load data into your platform to give that information to the AI so so that it's producing useful content?

Yeah. So I think this is where LLM

frameworks and or concepts

in general really help us

solve

use cases where

companies or organizations of any type have to train

their staff,

potential employees,

partners,

or students in higher ed or k to 12

with with specific

data

or content

that wasn't used to train the generative AI model with. So just a little bit of background, and I'm sure everybody,

your audience knows this, but I'll go ahead and provide a summary that models like generative AI models like GPT 4

or,

or llama

or anthropic, and there's more popping up every day, are generally speaking, they're called foundational models that are trained with a general corpus of publicly available information on the web. And sometimes it's public, but it's

gated, like, stack overflows.

Content is public to us. They don't allow generative AI models to scrape their data and then be able to answer, programming questions, for example, that were, you know, in the Stack Overflow database. So,

even though the information may be public, still may be gated somewhat.

So if you have a specific use case for,

content

that needs to be

transformed into a

course deliverable

or q and a deliverable,

then

what you need to do is a is a few things. The first 1 is you need to load the content from the and it's usually,

let's just use a hypothetical large organization

in the health care space.

So this hypothetical organization has a ton of content,

internally.

They may have content in images.

They may have it in video.

They may have it in PDFs.

They may have it in, you know, traditional relational databases. They may have may have it on internal

knowledge basis

that are

managed by third party vendors.

So

if if you wanted to teach a course, for example, on give me the give me the lowdown or give me the summary on the content for

medical machines

used to do

knee surgeries, I'm not even sure I'm using the right terms here because I'm not a health care guy, then it would have to look from a variety of data sources in order to

provide better context

in order to develop that course.

So that's that's 1 thing. And that pipeline usually consists of

loading the document

and then splitting the document into chunks.

So the generative AI model use usually has a limited number of tokens that they can accept

in order to, you know, create a

in the request, in order to create a response. It could be 4, 096 tokens. It could be 8192

tokens,

etcetera.

So there is a limit regardless of, you know, if if it's large or small. There is a limit. So all of that content is loaded, and it's transformed,

into into raw text.

Usually, something like ASCII. And then so you're stripping things like HTML

tags. You're converting things from PDFs to raw text,

things like that. And then there's a splitting

job.

And then once it's split, there's certain parameters that you can put in place there so you can have overlaps between the split,

chunks of text.

And then there's a separate model that you can call to assign a mathematical value to each chunk, And that's

where you have, vector databases

are are just so prevalent now because once you have a mathematical or basically a floating point number

assigned to each chunk of text, then all of that is stored in a vector database.

And there's a vector space where you can get a top k result, and there's different you know, there's a lot of math involved, but,

and I'm not an expert on the math, but,

there's a relationship between the numbers and there's a similarity between

how closely related those numbers are. So when you retrieve

context or data from that vector database

and there's a specific term in your query. For example, you say in a q and a chatbot, please provide me with

the most relevant techniques

to do, knee surgery with this new machine that we have, then it can provide you with concise answers because it has a better context.

So there is a a pretty involved data engineering pipeline.

Obviously, you can use open source LLM frameworks,

but those are just abstractions for the more traditional ETL,

transformation

pipelines that that you've talked about so much on your podcast.

And then how you store that information in a vector database.

And then there's, obviously,

in in a vertical in this hypothetical organization,

there's also a lot of cleaning

going on to remove personally

identifiable information if if it is,

in the data source.

So that's that's to establish context. But then there's another aspect of this.

If the model isn't trained to understand those medical terms, then it's it's gonna return garbage.

So the model

has to be

fine tuned. So, usually,

there's,

a a machine learning ops or an ML ops pipeline

that that's been around for a few years now, and I'm sure your audience is also very familiar with that pipeline.

But it's basically grabbing this foundational model

that, you know, we may use a private privately hosted version

of GPT 4 or we may we may grab a llama and host it in house.

And then you use,

a a, you know, a training test set of data

where in many cases, there is

humans involved, a human in the loop to label the data, to validate whether the outputs created by the generative AI model are indeed accurate. So you need a train, you know, train staff

to, you know, validate that the generative AI is providing with the expected results.

So that information is used to fine tune the model.

And then, once the model was deployed to production,

the MLOps

comes into play because you can

use

users' feedback. I'm sure everybody's seen the like or dislike buttons

that are common with

the chatbots, the generative AI chatbots. And that's just basically a way

for us to tell the system or the element framework

whether or not that response is accurate or not, and then that would be fed into

part of the training test data in order to keep

iterating and improving the model's performance.

So

that those are the 2 or 3

and that that the same thing would have to be done for the embedding model, of course. And then, also,

generating images or interpreting images and converting that into text is also

a source of grabbing

or obtaining foundational models and then fine tuning them. So,

these generative AI models can be multimodal

in the sense that they can understand not only text, but you can also send it images or videos,

to improve context

directly with that media instead of having to,

for example, transcribe a video to text and then send it to text

or describe an image and then send it send them,

text to their to to the generative AI. You can, you know, send the image directly.

Now to answer the the second part of your question is, okay. Well, how do you validate the results

or the quality of the generative AI models, and how do you

automate the that evaluation process is a big source,

focus for us because we are developing courseware, and we need to make sure that,

the source of the content is accurate, but also that the quality of the content is meets a a minimal

level of criteria. So in this hypothetical health care

company,

you would have to store in this vector database.

You can store

the source

of where the content was obtained from,

or you can also have third party data sources where you can

fetch, you know, somewhat with similarity search. You can also say, okay. Well, this this content was fetched from this URL or from this knowledge base article,

and,

you

can you can append,

as part of the response scheme,

the content, but also the source of where that was attained from.

And then you can

have, an evaluation evaluation

model

that, is checking if in fact

that,

content that's provided by the generative AI is in fact

located in that source with a certain statistical

result of accuracy. So you can you know, there's different ways to do that, but you can grab a sample of all of the the responses,

and then you can grab all the sources

for that content and then do

a check to see if there's a statistical likelihood that it's not accurate.

And then you can flag that content

so a human can go in there and then manually

check whether or not that content is accurate. So there is some

parameterization

involved where

you can say, well, it has to be 99%

accurate or 85% accurate,

and that's something that you would need to adjust over time. And then, hopefully, with time, the the results would get more and more accurate. So you can,

say, go from, for example, from 85% to 90%, 95%.

So that evaluation model is is working automatically. And then

the last thing is kinda like the process to evaluate the eval evaluation model. So you can have an auditor go into the company, click on that evaluation

service, if you will, but then they're gonna have to see audit trails and see a methodology

for how does that evaluation model work, when was it run,

what humans signed off on that evaluation model, etcetera,

so that the the auditor can,

go ahead and provide the company with an attestation report that the content that they're providing to their health care

employees

is is indeed accurate.

This episode is brought to you by DataFold, a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production.

DataFold leverages data diffing to compare production and development environments and column level lineage to show you the exact impact of every code change on data, metrics, and BI tools,

keeping your team productive and stakeholders happy.

DataFold integrates with DBT, the modern data stack, and seamlessly plugs in your DataCI for team wide and automated testing.

If you are migrating to a modern data stack, DataFold can also help you automate data and code validation to speed up the migration.

Learn more about DataFold by visiting data engineering podcast.com/dataFold

today.

And then the other aspect

of data in this platform, we've been talking about being able to bring data into the context for the AI. We've talked about using AI for generating the content, and so we've largely been discussing

from the content creator or educator standpoint, but also on the other side of the equation is the learner

and being able to understand their experiences and interaction with the platform to be able to feed that back into

the content development and content update workflow. I'm curious if you can talk to some of the

opportunities

for gaining insights

and with learner interactions with the platform to be able to feedback into the content development and content update workflow and some of the ways that you're thinking about the useful pieces of information

for,

collecting from that learner engagement?

What we've learned so far, and this isn't rocket science by any means, but what we've learned so far

is that there's

to be able to normalize the data

based on how learners interact with that data and so that that data can be used

for the purposes of analytics,

perhaps even, machine learning models that could recommend

better approaches on how to surface that content.

Is really hard

if the course material is not standardized.

So

if you have a course that's completely asynchronous,

as in the instructor isn't actively involved with you in a live session, for example. It's just something

that you take on your own, and then the instructor

provides you with feedback asynchronously

when they have time to do so. Then that, I think, is a more

standard way of providing

the the platform

with

data points that are consistent across

students.

However, it does restrict students to learning in 1 way. So if you're a student that would rather learn with audio,

instead of clicking on on your mouse in, your web browser by logging into your,

web application,

then those data points are gonna be very different. So

what we're trying to

understand is

how do how do we

assess the learners

when they first start the course to understand their personality traits, their learning traits, their desired outcomes?

And then based on on those features,

we can recommend

the most likely

path for them to improve their outcome. So it could be a series of audio,

videos or a series of chapters in an ebook,

and then we can

monitor how their learning outcomes

are

compared to other learning outcomes from from other students.

And, another big,

data point that's important for us to understand is, the level of of engagement.

So

if, generally speaking, if the student is engaging

a lot with the course, that means they're probably gonna learn more.

So having features

such as,

gamification

or having

other

aspects of the course material that may improve accessibility,

such as for the blind and deaf,

or, you know, improving contrast,

the size of the fonts, just basic things like that also may help improve engagement.

But it's also on how

to use generative AI to

restructure

the content into different tones.

Someone may learn with a more humorous approach.

Some may learn with a more factual or dry approach.

So we can also use

the AI to

improve the likelihood that the learner is gonna engage more with the content. So you so if if student a is more likely to engage with the content with video, then we can translate

the text that we have for the course and the images into video

with, you know, a human looking avatar,

and explain the video

ex explain the content in the video. Other students may,

prefer more,

interactivity or quizzes

or live coding exercises as in, you know, with a microlearning approach.

Because if they're like me, they're, you know, probably ADHD or something, so they need small snippets of learning content.

And, others,

may just learn with raw text and images, in a more traditional sense of, you know, a textbook.

But the AI can help us transform

very similar to how

the context

is transform

is is obtained by transforming different data sources into

a a structure

that can be stored in the vector database. So if you have course content, that's all text,

and that course content was obtained

from a variety of sources and stored as text, then you can also convert that text

into other formats such as

video, and then you can also measure the interactivity

specifically with those videos. So you can have videos with calls to action,

with little quizzes within the video,

or you can have that same quiz

reside,

within a a text block within a more traditional,

ebook type of environment. I don't know if that answered your question or

Yeah. And so now talking

through the implementation

of Illumadesc, I'm curious if you can talk to some of the ways that you've thought about the architecture of the platform,

particularly given the fact that you're aiming at this

rapidly evolving space of generative AI and the types of models and a,

as yet to be cemented

stack for being able to to interact with these systems and just some of the ways that you've thought about the design and implementation

of that overall,

product.

Yeah. So our our

stack was

or is mostly a Django Python back end.

So we have a Django

with the Django rest framework

for our back end. We have a traditional

restful endpoints for CRUD operations,

And then we have some microservices with the fast API,

framework,

and those microservices are used

sort of as a building blocks.

And,

the as yet to be cemented LLM framework.

So we're having to piece together things as microservices

that can scale,

both vertically and horizontal

horizontally within,

our Kubernetes cluster, which is,

hosted by 1 of the big cloud vendors.

So, for example,

1 of our microservices,

1 of our fast API microservices is only tasked

the data transformation part of the data engineering pipeline.

So even though we could use a more robust ETL,

tool, we're actually using,

line chain,

mostly because it's Python.

And, we're using a lot of the transformer classes that they already have available.

And if something doesn't meet our criteria for performance or reliability,

then,

we may use, you know, a more battle tested approach, whether it be a managed service

or, something,

like, connecting to, you know, a a spark cluster or something like that for ETL.

So,

each part of our

data pipeline is sort of split into a microservice, and and Django,

sort of bootstraps that together with, salary jobs. So we have Celery jobs that run,

jobs that as a pipeline that hook in these different microservices

together, including

spawning pods in our Kubernetes cluster that may

require

running code. So 1 of the features that we have in our learning platform is

being able to create code exercises and connecting to a back end runtime with a Jupyter kernel

that allows, you know, users to test their code or run their code.

We also have an auto grader that may use,

unit testing to evaluate whether or not the student's answer was correct. So

we also have part of that pipeline

is also interacting with our orchestration service Kubernetes

to to run some of those things. And our front end is mostly a React

application.

So we interact with our back end purely through,

WebSockets if we ever need to stream updates to our react front end and just traditional,

CRUD prod re request and responses with our REST API.

And as you have

built out this platform

going from the initial idea to where you are today. I'm curious what have been some of the assumptions that you had going in or some of the ways that you thought about the problem that have changed or evolved over that time? I think I think the biggest

surprise for me is that it always goes back to the basics from a data engineering standpoint.

So when I first was

exposed to the term LOM framework, I was like, oh, wow. That's cool. That's new. Right? But I, you know, I very quickly saw that the LLM framework is really just

a new term or a new title for things that we've

known

and loved for many, many years, which is basically a day data engineering pipeline. I think the LLM framework

does deal with things that are more focused on the generative AI landscape. For example, some of these open source framework do have retrievers that are focused specifically on the vector database,

and they're already, you know, ready out of the box.

And the communities that support these frameworks,

In many cases, there

are, the the vendors themselves.

As you know, many new vector databases have spawned in the ecosystem and are purpose built for that purpose. But other than that, if we're talking about document loading,

I think we all know that's pretty traditional

document or data transformations. I think that's pretty traditional

calling and embedding model model to, you know, identify chunks of text with a number. It it labeling our data is also,

been around for years.

So the more we the more we try to incorporate

an element framework into our back end stack, the more we move towards to

developing

a a a battle proof end to end data engineering pipeline

that just so happens to interact with the elements that we need to

have a generative AI

framework.

So

a part of that pipeline is calling the generative AI model to get a response, storing it, and then providing that response to the end user. So for me, that's been my biggest takeaway from my, or from our adventure so far in this space. I'm sure there's a lot to be done

still, and I'm sure we all have our own predictions on what could happen in the future.

But moving forward, we we just wanna leverage,

things that we know work and that have been battle tested

and wherever possible. And then

if 1 of those tools doesn't have the capability,

that we need, then we can just have small snippets of code

that can handle

the things that we need. For example, if we need if we have a specific

parsing,

problem that we can't solve within the open source or, excuse me, a battle tested solution, then we can develop a small snippet of code to, you know, for example, convert fence code and markdown to a,

you know, a specific,

block in a JSON schema. And in terms of the application of Illumadesk,

we've talked largely about the platform,

the features, the capabilities

For people who are looking to

build content, what are some of the ways that you have seen it used most broadly? Is it largely for paid courses for or for, you know, individual

practitioners to be able to share their knowledge?

Are you seeing it used for

internal company trainings for being able to stay up to date with their, internal technology stacks? I'm just wondering if you can talk to the the workflow of

onboarding into a Lumidesk

and thinking about the content

creation

workflow

and being able to use the AI capabilities

to iterate on that the the development process there?

Sure. So when we first started, our niche was very focused on the data science

training space,

partly because of our background at the company

and also just because we knew the personas

and the problem, you know, and and and the content that they were trying to teach. And

what we incorporated

at at that time was a feature that, as mentioned previously, is to connect to Jupyter kernels

to allow instructors

to,

create coding exercises,

content with native markdowns so they can import their Jupyter Notebooks directly into,

our authoring tool, which we call activity editor.

But very quickly, we we saw that the data science

subject

is used by many

professions,

whether that be financial,

whether that be health care or sports.

You know, obviously, there's a lot of

statistics and, data science in sports.

So we became exposed to other,

departments, if you will,

once we had the, you know, the data science use case shipped.

And now the those

instructors and those content managers are saying, hey. I need a compliance

course

for my company, and my company is in the insurance vertical or in the health care vertical, and this compliance course needs to be,

based mostly on

my private internal data.

And the generative AI model that I'm using, actually,

also needs to be private. So we don't wanna have we don't wanna have a situation where we're grabbing people's private data and then sending it to

a publicly hosted, generative AI model. So I think for us moving forward, the most surprising thing is

is how the breadth of the types of content you create for as far as use cases go. So you can do vendor training. You can do compliance training.

In higher ed, we have

a customer doing a robotics course with a programming language called Julia.

So there's if if you can mix and match these

Legos, if you will, or pieces

of of the LLM framework puzzle where you can bring your own model, get your own context,

and then also

structure the course in the layout,

that makes the most sense to your learners

and then evaluate the outcomes for your learners in addition to the generative AI's

content during the authoring process,

then

the possibilities

are somewhat endless

and for us a little bit overwhelming. So we're trying to

stay focused on very specific use cases

so that we're not trying to be all things to all people. And in your work of building the platform and

putting it in front of people to develop their own content? What are some of the most interesting or innovative or unexpected ways that you've seen a Luminess used?

So the first

use case that we had was to build courses. So we we went to market with a a traditional

course authoring

and learning management system system paradigm where you have a course authoring tool to

develop your courses and your content. Whether or not you choose to use a template or start from a blank slate would be up to the

content manager and or instructor.

And then on the other hand, we have the system of record with the learning management system. But what works we're we're quickly finding out is that people are using,

our generative AI models

to and our chat with AI interface that we

incorporated as a first class

citizen into our user interface

where the instructors and content managers

doing a question and answer session with their internal

documentation, whether it be a knowledge base or a corpus of PDF documents.

We run the jobs to log the data

everything we've talked about, run the data engineering pipeline,

if you will, to improve context. And then based on

the interactivity

that they have with that q and a chatbot,

then they can copy that information into the authoring tool to start building their courses or enhance the template that we already provide them. That was probably the most surprising

workflow

that we saw

when we were when we shipped our tool, and the ask was was so great for the chat with AI that we just incorporated it, like I said, as a first class citizen. And the other 1 that was surprising to us is surfacing the content not just as a course, but also as other in other formats

such

as many many,

a mini series of blog articles

is is I think a pretty common test bed for, hey, I think I'm thinking about writing a book,

which is obviously

another way to learn content. But I would like to test how many people might be interested in this

subject by having a a mini series of blog articles,

and I wanna do that first sort of as an MVP. So

having the platform become

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links