Summary
Gaining a complete view of the customer journey is especially difficult in B2B companies. This is due to the number of different individuals involved and the myriad ways that they interface with the business. Dreamdata integrates data from the multitude of platforms that are used by these organizations so that they can get a comprehensive view of their customer lifecycle. In this episode Ole Dallerup explains how Dreamdata was started, how their platform is architected, and the challenges inherent to data management in the B2B space. This conversation is a useful look into how data engineering and analytics can have a direct impact on the success of the business.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- You listen to this show because you love working with data and want to keep your skills up to date. Machine learning is finding its way into every aspect of the data landscape. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype. Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. The Data Engineering Podcast is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to dataengineeringpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll.
- Your host is Tobias Macey and today I’m interviewing Ole Dallerup about Dreamdata, a platform for simplifying data integration for B2B companies
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by describing what you are building at Dreamata?
- What was your inspiration for starting a company and what keeps you motivated?
- How do the data requirements differ between B2C and B2B companies?
- What are the challenges that B2B companies face in gaining visibility across the lifecycle of their customers?
- How does that lack of visibility impact the viability or growth potential of the business?
- What are the factors that contribute to silos in visibility of customer activity within a business?
- What are the data sources that you are dealing with to generate meaningful analytics for your customers?
- What are some of the challenges that business face in either generating or collecting useful information about their customer interactions?
- How is the technical platform of Dreamdata implemented and how has it evolved since you first began working on it?
- What are some of the ways that you approach entity resolution across the different channels and data sources?
- How do you reconcile the information collected from different sources that might use disparate data formats and representations?
- What is the onboarding process for your customers to identify and integrate with all of their systems?
- How do you approach the definition of the schema model for the database that your customers implement for storing their footprint?
- Do you allow for customization by the customer?
- Do you rely on a tool such as DBT for populating the table definitions and transformations from the source data?
- How do you approach representation of the analysis and actionable insights to your customers so that they are able to accurately intepret the results?
- How have your own experiences at Dreamdata influenced the areas that you invest in for the product?
- What are some of the most interesting or surprising insights that you have been able to gain as a result of the unified view that you are building?
- What are some of the most challenging, interesting, or unexpected lessons that you have learned from building and growing the technical and business elements of Dreamdata?
- When might a user be better served by building their own pipelines or analysis for tracking their customer interactions?
- What do you have planned for the future of Dreamdata?
- What are some of the industry trends that you are keeping an eye on and what potential impacts to your business do you anticipate?
Contact Info
- @oledallerup on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
Links
- Dreamdata
- Poker Tracker
- TrustPilot
- Zendesk
- Salesforce
- Hubspot
- Google BigQuery
- SnowflakeDB
- AWS Redshift
- Singer
- Stitch Data
- Dataform
- DBT
- Segment
- Cloud Dataflow
- Apache Beam
- UTM Parameters
- Clearbit
- Capterra
- G2 Crowd
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. What advice do you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I'm working with O'Reilly Media on a project to collect the 97 things that every data engineer should know, and I need your help. Go to data engineering podcast.com/97 things to add your voice and share your hard earned expertise. When When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With 200 gigabit private networking, scalable shared block storage, a 40 gigabit public network, fast object storage, and a brand new managed Kubernetes platform, you've got everything you need to run a fast, reliable, and a bulletproof data platform.
And for your machine learning workloads, they've got dedicated CPU and GPU instances. Go to data engineering podcast.com/linode, that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. You listen to this show because you love working with data and want to keep your skills up to date. Machine learning is finding its way into every aspect of the data landscape. And SpringBoard has partnered with us to help you take the next step in your career by offering a scholarship to their machine learning engineering career track program. In this online project based course, every student is paired with a machine learning expert who provides unlimited 1 to 1 mentorship support throughout the program via video conferences.
You'll build up your portfolio of machine learning projects and gain hands on experience in writing machine learning algorithms, deploying models into production, and managing the life cycle of a deep learning prototype. SpringBoard offers a job guarantee, meaning that you don't have to pay for the program until you get a job in the space. The data engineering podcast is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes, and there's no obligation. Go to data engineering podcast.com/springboard today and apply. Make sure to use the code AI springboard when you enroll. Your host is Tobias Macy. And today, I'm interviewing Oleg Dalerup about DreamData, a platform for simplifying data integration for b to b companies. So, Oleg, can you start by introducing yourself? Yeah. Hey. Thanks, Tobias, to for having me. My name is Oleg,
[00:02:33] Unknown:
the CTO and cofounder of GreenData. We build a a platform to help b to b companies, gather all their data from marketing, sales, and, growth products into 1 platform so they can get a unified view of how they're acquiring customers and how they're acquiring,
[00:02:52] Unknown:
the customers who's paying, the most. And do you remember how you first got involved in the area of data management?
[00:02:58] Unknown:
Sure. I thought a little bit about that. And, actually, I think it was, while I was, studying. I was studying computer science and, and a lot of kind of, my friends, they were playing poker online, and I also caught that. And and we ended up being pretty good at it. But 1 of the tools we used was a tool called Pooka Tracker that was kind of tracking all our online, gambling and tracking our components. And every day, after I kind of, gambled all night, I would kind of get up in the morning and look at data. How did I play? How did I kind of act on certain situations? And sometimes, of course, I acted very poorly and sometimes, hopefully, also good. And I will kind of use the data to to improve my skills.
[00:03:42] Unknown:
And so now you're building the DreamData platform to simplify the overall integration and visibility of data across the different touch points in b to b companies. I'm wondering if you can describe a bit more about what it is that you're building there and some of the ways that it's being used by your customers. Yes. So some of the things we experienced was that,
[00:04:03] Unknown:
so we come from a con company, you know, we kind of, a couple of years back. 2 of the founders, we came came from a company called Trustpilot. And 1 of the problems we had was we were driving a lot of traffic to sign up for our customers, and it was difficult for us to kind of put any kind of number on what those kind of sign ups. And it was often free sign ups, free like a free business product, sign ups. And it was hard for us to put any real value on it. When we asked sales, they described this as being really 0 value. And when then we went to marketing, and marketing had a similar experience. They were, like, buying, expensive leads from, like, Google Ads and Facebook Ads and so on. And they couldn't either kind of justify always all that spend. And we found that would be like that wouldn't that was a strange thing for us because we were used to, working with, like, tech products. We were used to tracking all use activity.
We knew what all our customers were doing, how they kind of, came in to the product the first time, how often they used it, what functionalities they used. We had all that kind of insight, and so we found it strange that we couldn't also have that same insight insight from the sales and acquisition point of view. And that's how kind of things started. We we started building this out, saying, okay. So now let's kind of start tracking all the users, understanding where they are. Let's look at all the different systems. We have kind of commercial systems where we have users' activities in, which is I mean, a trust partner with stuff like Zendesk, for for support tickets, Salesforce for sales activity, Maison, we're using HubSpot, and we probably had, 3 or 4 other systems where we had a a fraction of of the journey in. And then additional came normal web tracking. But as a kind of a company grows, you suddenly end up having a lot of different websites with different landing pages and all kinds of stuff where you have to track the users. And that ended up being relatively complex, and that's when we decided also to start the the company. And that's really the problem we still solve. We pull in data from a lot of different sources.
We join it together so that you have 1 uniform way kind of view of what the customers are doing, how they're interacting with your products. Customers when I say customers, it often means customers or prospects. But we now then we kind of can map up a journey, and then we can do apply more interesting analytics on it, which can be, which is often something about, like, trying to understand all these touch points at a more aggregated level and understanding what of those touch points are actually driving you revenue. Is it the how you acquire the customer like the first touch, Which is sometimes interesting in an marketing perspective, but there's a lot of kind of things happening in most companies. And and most of our customers, they have a very long sales journey. So from the kind of see the customer the first time until
[00:07:04] Unknown:
they can they are kinda able to close a deal with them, it's not uncommon that it takes, 6 months. So the company was born out of the frustration of trying to gain visibility about all of the different interactions of customers and how that fed into the overall success of the business that you are in. But what is it that's keeping you motivated as you continue to build out and grow the capabilities in Dream Data? I think partly
[00:07:29] Unknown:
actually giving that, visibility. I think a lot of companies, they don't have a good picture of, like, how things are going, and then they don't have that visibility of the details of how they are helping customers, what are the customers doing, how you're acquiring customers. And and that might for some companies, that's not necessarily kind of at the beginning a problem because what they they find a channel, a sales channel, and they succeed in that, and they kinda do kind of repeat that logic and and try to do that again and again. But as soon as something comes into the market that changes the profitability of that channel, then they get into problem. And I think you're seeing that, weirdly seeing that with more and more companies that they used to go, for example, Google Ads. That used to be very, like, a good business. They would just acquire customers that way, but then the price started increasing. And then usually for a lot of companies and particularly in b 2 b, that wasn't necessarily the huge the biggest problem because they could actually pay a lot for customers.
Often contracts the contracts here, like, from $10, 000 up to several $100, 000 per year. So, like, whether they paid a $100 or maybe $500 to acquire a customer, That didn't matter. But, suddenly, it stopped working, or they couldn't buy the demand because now the price was not, like, $500, it was several $1, 000. And so they started going out to other channels, Facebook. And funny enough, you see a lot of b 2 b companies actually advertise on Facebook, and some also managed to get it to work, but that's a different topic. They go out on other channels, and now the complexity becomes much larger because now you're not just looking at 1 channel. You you actually actually need to combine a lot of different sources. You might also be aggressive on emails. And so so I think that's a lot portion of it. And then 1 thing that is for me important is also silos. I don't like that I find it problematic when companies put themselves in silos. Often, you see, marketing sit in 1 silo, and they are tasked with get more leads to sales. Just get more leads no matter what. Just get more leads. And then sales, if in their kind of silo and they're tasked with getting revenue, and the management's just telling them, hey. Sales, get more revenue. And then it doesn't take very long before sales tell Myo saying, hey. We are not getting revenue because, you are coming with poor leads. And I think that's, I mean, that relatively poor way of communicating because it doesn't help much. They have to work together. My team are good at certain things. Sales are good at other things, and they need to work together on driving revenue. And that's important. And if you are like a assessed product maybe and doing, free, free products, then I think the product department also needs to fit into that game. They are responsible for driving revenue, not responsible
[00:10:23] Unknown:
for giving away your product. And there are a number of people who might be familiar with the b to c or businesses businesses that are selling to consumers directly. But what are some of the ways that the experience of those organizations differ from the b to b or businesses that are selling directly to other businesses? And how does that impact the overall complexity and, capability of gaining a useful overview of the customer life cycle?
[00:10:52] Unknown:
So often would be to see you have more data, which, of course, has certain challenges of, like, scale. Today, that's, of course, relatively easier than it used to be. But the good side about a lot of data is you can all often do statistical models. And so that's very interesting, and you'll see large companies, they've been doing this for for a very long time. I think, the use case I've already, always heard about is that McDonald's could predict their sales by looking at the weather, which is probably true to a license spend. And so if you have this enormous amounts of data and history, then you can actually predict a lot of things with just, like, it's not basic statistics, but statistics at least. When you move into the b to b world, it's often a case where you don't have enough data to do at least this kind of statistic. And then you need to look a little bit more into the data, and every detail you can get matters. The good part is then also there's fewer people, so you can actually start recording at a higher level of detail who did you talk with and so on, which how often happens in the Centimeters systems.
So that's 1 of the advantages. And then there are a lot of small details, like if you are a b to b company, and are doing tracking, well, then you can do a reverse lookup of the IP addresses and then start understanding a little bit of, like, who was actually visiting my website without understanding necessarily, like, without having anyone signing up, giving their email address, you can actually start guessing about which company this is. So that's kind of a a few differences. And then from the tracking point of view, the biggest differences in the b 2 c world, the you track the user, and the user is also the 1 buying. In the b 2 b world, you track a company, and there's a lot of users involved in actually making the deal. And the ones the person buying at the end is usually not the decision maker. At least my to my experience, when I was running at least the large technology operations, then, I was often an important decision maker, but I was rarely the person that,
[00:12:56] Unknown:
that have the credit card that was our assistant. So, yeah, in that manner. And you mentioned too that there are often silos that occur between some of the different responsibilities of people who are interaction cycle. And I'm wondering what you have found to be some of the contributing factors that give rise to those different silos and the challenges that that poses in terms of being able to effectively map the journey of the customer through all those different interaction points to the point where they're actually paying you money? Yes. So I think I'll start here by telling a little bit about, like like, where I come from because my background is engineering.
[00:13:35] Unknown:
And when I started engineering, software engineering, I I was of told what to do by business people. And I got to learn that that's not always a good way doing. And I think most kind of companies that today, they're working in cross functional teams, Engineering teams, they are very kind of work side by side with the product managers and the designers, and they together are responsible for delivering a product. That is important because the engineers know which technologies can actually make a difference, but the product managers are also the ones and the designers having a good understanding of the users and the customers. And that way we can build the greatest products. For me, this is a little bit the same in the field we're working at right now. Marketing and sales in poor companies, they are silos, and they blame each other. I think that also happens in the poor and the that for sure happened in the old days, like product management or program management, we called it that, at that time. Right? And engineering.
They blamed each other and worked in silos, but they found a way to work together. I think we want we need to see marketing and sales do the same. But to do that, they need a way of measuring the results in the same way. So that right now, sales are often measured by, like, bookings or, like, how much sales they do, which makes sense. Marketing needs to be the same. But now we need to share the revenue, and that's where green data comes in. We need to like, we work with companies and help them try to find a way where they can actually do some kind of sharing and so they can easily work together of like, sort of marketing are not only responsible of getting more leads, but they're responsible for getting more revenue because that's what matters. And so
[00:15:21] Unknown:
quantity is not necessarily a good thing here. Does that make sense? Yeah. Definitely. And you mentioned too that because of these different responsibilities, they might be tracking the information in different systems. And so I'm wondering what you have found to be some of the common sets of information that you are tracking for the different roles and the different types of source systems that you're trying to integrate with. And I'm curious how you're approaching the collection and cleaning of that data in order to be able to build useful insights for the businesses that you're working with. Yeah.
[00:15:56] Unknown:
And I also took some time to kind of figure that out. What we do is in product is we we pull out the data using the APIs of the different systems and store it at least at the end inside, our data warehouse. We use, Google BigQuery as our data warehouse, which is super essential to kind of, how we do things. I will add here that I used to use Amazon Redshift a lot. Amazon Redshift would not be able to do what we do, at least not, in a decent cost. And then, I think Snowflake I haven't used Snowflake so much, but you could in theory do this in Snowflake, I think. But we pull in the data pretty raw, and pretty much as it is.
We, of course, have jobs that then runs and kind of does this, asynchronously. And we use a kind of a different set of projects, to make this work, depending on what type of data it is. Often when we pull data from these other SaaS tools like, Sendis, Salesforce, HubSpot, those kind of tools, we use, an open source project called singer. Io, which I'm sure a lot of the listeners are are familiar with. And if not, then, I mean, it's a great detail tool to check out. It's kind of the open source version of, stitch data, and we use that to, pull in some data. And I would have the kind of the raw data. Then we do transformation. So we clean up the data into a uniform model. And that that is, of course, a little bit more sophisticated, but but let me kind of the simple version is in all CRM systems, there's a contact. Sometimes they call it different things, but there's a contact. And the contact has an email address. They have a name. They have an ID. They have a creation date and so on. So we find a couple of fields that we kind of need, and we pull that out. And so we pull it into a uniform way.
The same we do with, with companies or accounts and activity data and so on, which is also depending on what the system is. We kind of, type it into either activity data, contacts, or companies. That's kind of the primary kind of objects we have. Additionally, we need to get our revenue, which is probably the the problem we haven't 100% solved yet. So here, we actually often have a custom script to customer that ensures that we take the revenue in the the right way. Unfortunately, most systems that contains either bookings or or revenue are different. They structure the data in different forms or customized per customer, which is, not always idle for us. We're getting closer, but we're not there fully.
And then we crunch the data. So we build up data models, crunch the data, and we use a tool called the data form. People that might be familiar with DBT, but Dataform is kind of competitor to that. And if you were like like BigQuery a lot, then in particularly, it's very interesting, but they support Snowflake and Redshift and probably also other databases as well. But they can really help you, like, build data models and build up the dependency graph so that when you run your data model, then instead of running kind of doing a lot of schedules that where they all depend on each other, then you just build up the graph, and they ensure that models are running the right order, which makes, at least that job, relatively easy. And then, additionally, we pull in a lot of tracking data.
So we build our own kind of tracking pipeline. People are maybe familiar with a company called segment.com, which does, customer data infrastructure, and I'm a huge fan of them. And we have an integration to them so we could get the data in like that. If customers are using Segment, unfortunately, not, enough customers are using, Segment. So we build our own and use the that own source, the analytic days to kind of build our own pipeline where we're piping in the data. We're using a lot of, Google Cloud Dataflow.
And Dataflow is also it's an Apache project called, I think it's called Apache b, which is also very interesting, tool to to kind of stream data and do some kind of transformation, and then stream it down into whatever store you have, which for us has been often BigQuery, but could be a such, anything else. And they're very interesting with this comparing. Like, I mean, it's a it's a product that is very close to tools like Spark, but the difference between them is often that, they it outer scales out of the box. So it's very easy to have, a lot of datas coming in. And so, therefore, you can actually stream the data live into, for example, BigQuery, which is very useful. And 1 of the
[00:20:48] Unknown:
challenges that exist beyond just being able to manage the data as it's coming in is ensuring that there are useful data elements that are being generated in the 1st place to be able to identify all the different ways that you're interacting with the customers at the different points. And what have you found to be some of the difficulties in working with your customers to ensure that they are using the systems that they say they are and ensuring that all of their interactions are getting recorded so that they are able to get that effective visibility across the different paths that customers might take. My general experience in this is that the best way of getting
[00:21:26] Unknown:
data cleanup project started, successful is by start using the data before it's clean. So many not all of our customers, definitely not, but a a large portion of our customers are coming to us because they actually acknowledge that. In fact, they they wanna clean up their data, but they also understand to actually do that, they need to start using the data for something. And so they can show people that need to input or actually run the projects to clean up the data or make it kind of consistent. They see a real outcome of doing it. And so in doing that, we help our the customers, like, set up the systems they use. We get it in. And then very often, with a few minutes or at least not more than a a few hours of work, we can spot the general problems.
And and for b to b companies, the general problems are typically, stuff like, you're not recording, this. You're not adding UTM parameters on your links. You have a lot of duplicated accounts or contacts.
[00:22:28] Unknown:
This is something you wanna look at. That that's typically the problems we see. And then once you have the data onboarded to your platform, what are some of the approaches that you're using for being able to do something like entity recognition or entity resolution or master data users and interactions map to and being able to build that overall visualization for the for your customers as to what are the useful pieces of information that they're able to take some sort of action on? Yes. So that's a a good question. I mean, so first, we have now all the data in. And so from our perspective,
[00:23:11] Unknown:
we start with the contact. And we find all the contacts, and the contact for us is typically an email address, but it could also, in theory, be a phone number. That's actually the ID for us, and that's another benefit of being in the b to b world versus the b to c. In the b to c, you will always say, hey. The user could change their email address. But in the b 2 b world, if you change your email address, it may typically means you changed your job, which is actually often a final indication of this is another person now. Like, technically, of course, it's not another person, but but from a tracking point of view, it's it's a it's a good way of doing it. And then we find that person in all the systems, all the system where the that contact is present. We find all the activity for each of these kind of systems. And so that's generally pretty straightforward because the systems are most of these systems are built so that they try to keep a good relation between the activity. For example, send this ticket that's typically very easy to link to who actually kind of, who's the customer. And that makes sense. Right? Because those who can't reply back to the customer. So often you have that information. And then the more hard part when we talk, ID resolution is typically, tracking data. And to track the users, we set a cookie, and so we can track kind of users.
Then when they identify themselves, which is typically they sign up to a form, they log in to a product or similar, We associate the the user's email address with that cookie. And so now we can see also kind of for the past, what has that user done. And that we kind of build up together. So now we have activities and users, users being email addresses. The next step is to figure out which company does this user belong to. And so the first place we look is, in the CRM system, which is typically the record of truth, at least for most companies, of kind of the relation between a a user and a company. I mean, that's great. We, sometimes we we can find the user there. Sometimes we can't see the user there.
Then we take the next step. And then we take and see if the user is having an email address that's actually a business email. Let's say it's a business email, then we'll try to take that, email, that website and try to find the company with that website. If you succeed in that, great. We've succeeded, and now we kind of, fluffy linked the the, fuzzy linked, the the the user with the to that company. And then the last resolution is that we, do reverse look out the IP address. So the user had some activities on the website, and it's either the user is is signed up with a Gmail account, so we can't kind of associate it with a with a company, or we don't have an email address. So we simply don't know who to get users. And we can reverse look up the IP address and then, get a a website again and then try to cross link the user activity with with a website. And then 3rd, sometimes if our customers are using, enrichment tools, which could be Clearbit, for example, then we can sometimes do this a little bit better. I mean, Clearbit would sometimes be able to look up, an email address and then actually put it on a specific company with a maybe higher accuracy
[00:26:23] Unknown:
than us. And you mentioned that you're using singer for a lot of the data ingestion or optionally using something like Segment. I'm wondering what you have found to be some of the challenges in terms of being able to map to a useful lowest common denominator data model and the ways that you're approaching either dropping data or or removing data that is that doesn't map to that model or being able to maintain extra information when it's available so that you can optionally expose that to the end user who's trying to do, deeper research on the datasets that you've compiled for them? I mean, this can be very difficult sometimes.
[00:27:03] Unknown:
I think the biggest challenge sometimes is that not sometimes. It's always the biggest challenge is you have to actually understand that system pretty well-to-do this. So if you don't know Salesforce, it's pretty hard to do the mapping even looking at the data because there's always kind of strange things happening. So when you go into a system you don't know, then there's a couple of things that are usually easy, like contacts and companies. That's usually easy. There's a name field. There's an email field. There's a creative field that I mean, you usually would figure that out. But when you start looking at activity data, it always becomes a more complicated.
And often the systems are also exposing that data in a in a less natural way, and they are more unique. I think the first thing is and what I always kind of recommend people to do is if they work at a company and they actually need to analyze, let's say, Salesforce data, they'd start by getting a login to Salesforce so you can browse around and see the data in kind of the view that the salespeople or managing people are looking at it. That helps a lot. That makes it a lot easier. And that's also what we do often. If we need to integrate to a new system that we don't necessarily know, then often we would ask the customers, can we get access? Like, if you access the whatever, it just helps us a lot to to see the data from that perspective when we build integrations.
[00:28:28] Unknown:
And what does the onboarding process look like for customers who are starting to use your system and the overall process for being able to integrate with their data sources and ensure that you're collecting all of the necessary information for the different ways that they're interacting with customers?
[00:28:46] Unknown:
So the first thing we do is, we set our cost to account, but the first thing we ask them to do is add the tracking script on their website. And the reason for this is quite simple. Most companies and I find this scary, actually, but but it is unfortunately very true. Most companies actually don't have very good tracking on their website. When you call most companies, then then you figure out that the tracking they have in their website is Google Analytics. And where Google Analytics is a great tool for kind of looking at some basic web traffic, you can't get the data out. And it's not at any fine grained level on who was actually doing the activities. It's aggregated data, which makes it like a really poor tool to install the, for me, crucial business information. So that's the first thing we always ask customers to do. And also before we start, like, deep conversations around how the data look. Then often, we will I mean, it depends a little bit on, like, whether the customers are more or less technical.
But, if they are more technical, we'll just ask them to connect to all the tools, which is just locking into our product and then, typically just doing, authentication with whatever service, and then we get a token and so on. And we can start pulling the data. And then we'll set up the syncs. Initial syncs takes sometimes a little bit of time. We just close the customer with, I think, 5, 000, 000 contacts that will probably take a a few hours to synchronize and so on. So so so that's kind of the first thing we do. But very quickly, when we have the data, we set up a call with the customers to try to understand that process. How are you doing sales?
What do you how is what's your marketing processes? How early in the funnel? Like, what do you how long we ask questions like, how long is this the sales funnel? Do we expect this to be, like, very short? Is it very long? And most of our customers, they are, like, 3, 6, 9 months sales journeys, which means that often you wanna look at something that happens earlier. So we try to understand, which they often describe is often described as, marketing qualified leads or sales qualified leads, and you try to understand, okay. How do we map that out? What's the definitions for you? We talk about, how's the revenue mapped? If they use vendor systems like Salesforce to to have this in, you would look at the opportunities.
You try to understand, like, how you're mapping opportunities. What is the amount that actually says there? Are you a subscription business? Are you transactional business? So we try to understand all those kinds of things. We try to get a couple of base numbers in, for example, like how much money did they spend on Google Ads last month? How much revenue did they have last month? So on. So that we have a couple of base numbers so that when we get the data in and we have crunched it, we can kind of say, just okay. It's in ballparks at least.
It's the same numbers, and so we did it right. We have all the classical problems as well. Right? Doing SQL, you often you can easily end up duplicating data and so on. So we are, of course, careful around that. And then because of the fact that your customers
[00:31:51] Unknown:
do each have their own specific ways of interacting with their customers and specifics in terms of the data sources that they're trying to use and integrate with, how does that affect your overall approach to building out new features or your product roadmap to ensure that you're going to be able to fulfill the needs of your customers as you bring them on, or the product verticals or the market verticals that you target in terms of trying to gain your own customers to ensure that you'll be able to work with them effectively? Yeah. I mean, this is always
[00:32:24] Unknown:
difficult. I think building products is hard sometimes because you have to like, you can't access some consultants built to customize solutions. On the other hand, you can't generalize so much that it's not useful for the customers who actually has more specific needs. So we kind of work every day to find that balance. If we have to stay with the data side of things, it's actually mostly pretty straightforward. We actually are able to map all the data pretty generic, only applying some configurations. And the only exception is, to that is what I mentioned earlier is it's revenue, where we have a small query that that kind of normalizes the data per customer. And that's actually the only kind of custom data thing we have per customer. When we talk about the user interface and how we analyze the data, it's all the same. But I think we are challenged sometimes that certain customers wanna look at a specific number. We had recently a conversation with 1 of our customers.
We were telling you a number, that was kind of supposed to be a return on investment, number on your ads, and they were advocating a different number. And those kind of things is always hard conversations. As such, the numbers would they represent the same thing. They were using just a slightly different formula, and that's hard sometimes to kind of question. We try in general to go with the what's the kind of norm in the in the market to do. But sometimes, we also entering a new way of looking at data, and then it's a little bit harder. We kind of have to well, we try to call talk to customers all the time, I mean, as often as we can and try to understand them and then, pick the best solutions across our customers. Right? And then sometimes 1 of our customers are not super happy with that solution, but, that's how it has to be. And in terms of being able to build out reports and visualizations
[00:34:23] Unknown:
of the information that you're collecting and the analysis that you're providing, what have you found to be some of the useful strategies? And what are the challenges in terms of being able to make sure that those reports are actionable and easy to interpret for your end users? It's all with you your customers.
[00:34:41] Unknown:
It's unfortunately the best. I mean, if you had a product where you can track users and behavior, where you have enough data to kind of look at that, then definitely do that. But what we do is we talk with customers all the time, try to understand what they're trying to do, try to ask them to kind of go through the report and see what they get out of it. That gives us a lot of insights that we can often correct the dashboards.
[00:35:05] Unknown:
In terms of your experience of using Dream Data for your own business and being able to map the journeys of your customers, how has that influenced the overall direction or product road map for the business that you're building? So far, not so much.
[00:35:20] Unknown:
I think our, head of sales is kind of pushing a lot to to do changes here. I think he has a lot of things that he would like to see in the product that is helpful for for him. We're a little bit careful about that because, like, we're like, partly, we, of course, trying to solve our own needs, but but I think mostly we are trying to solve our customers' needs. And so, so far, we are careful, but we use our own product a lot. So there's, I think, primarily cover functionality we use a lot. So we we are building a customer journey tool. And if at least for those who listen carefully, I mean, we have a really good mapping of all the companies who's been on our website. And so we can actually see who's been on our website, like, both, people that are anonymous, but also, companies that has been identified themselves.
And that's very interesting sometimes that when we had a sales conversation with a company maybe a few months ago, and maybe we stopped the conversation, they were not ready to buy, they were not interested, whatnot. There's many reasons not to buy. But then when you see, that they come back to a website and they have a couple of visits, then it's maybe time to kind of call them again. So that's 1 of the ways we use it. And then we, of course, use it to kind of keep track of our, like, paid and ensure that we don't overspend compared to how much we make. And what are some of the most,
[00:36:46] Unknown:
interesting or surprising insights that you've been able to gain as a result of being able to view the data and the analytics that you're compiling either for yourself or when working with some of your customers? So the insights are are many and very detailed. Let me cut a concrete example. So we found a customer, and,
[00:37:06] Unknown:
we saw that if we split their groups of customers into 2 buckets, those who had more than 1 session before the the first sign up and those who had only 1 session before they signed up or, like, they signed up in the first session, on their website. Then we found for that customer that the first bug of those who had more than 1 session before they signed up, they were 5 times more likely to actually, end up as customers. So that's a type of data we see. Often, we can help customers really understand, like, what are their paid medias actually work.
Like, they do a lot of advertisements, spending several $100, 000 a month on showing ads. But, like, I saw these ads not really bring any money back. The truth for most companies also that their ads is very useful, and they are very important. But just a percentage of that, they could just close down without losing any revenue. And then we help, in particular, companies understanding what content pieces are driving revenue. 1 company we helped understand so they they were doing some partnerships with with a with another competitor, but but more like, a company related to kind of what they did. And they wrote a couple of articles together, but as such, they were actually writing the articles more to, have the partnerships which also help them in other cases. But it turned out that those articles were actually driving a lot of revenue, and they were driving so much revenue they could actually pay for their content team. And so so that company, they should kind of talk with them and try to get them to to do more of this, like and help them to realize that actually maybe this is the type of content you need to produce more of. But they had other content pieces that were driving a lot of traffic. And in that sense, they looked good. But when they looked at how much revenue they got out of it, they didn't make any money. And so this is often what we see in the b to b world that there's there's, of course, correlation between traffic and revenue, but there are maybe other things that impacts, your revenue more than traffic. And it's interesting to try to find those pieces, and we can help that with that. And in terms of your experience
[00:39:21] Unknown:
of building and growing both the technical and business elements of Dream Data, what have you found to be some of the most challenging or interesting or unexpected lessons that you've learned?
[00:39:31] Unknown:
I think it's always hard to grow and build a team. And I think maybe in pre at DreamData, I haven't had, like, on building the team, doing that. I think I haven't learned, like, my lessons are probably more from the past. And definitely here, I mean, for me, I'm an advocate of, like, hire people you trust. And if you don't trust people, then, I mean, let them go. Like, then that's not good. If you if you don't trust them anymore, then you shouldn't kind of have them on your team. Whether that's fair or not fair, that's, like, unfortunately, that's how it is. But if you have a team you trust, you can also give them a lot of responsibility and and get them to fix things, and that just, like, builds a good culture. When we talk at Green Data, I think, some of my big challenges was the commercial side of things. I had definitely had to learn to go into sales conversations and have, like, sales conversations, avoiding to talk too technical, find that right balance between talking code and the talking, like, more, about the value and of the product. And so and I think, I'm, partly still struggling on that. And partly, I'm also struggling on, like, finding it enjoyable. I I do enjoy building, data and writing code a lot. And, I mean, sometimes other parts of the job dictates that I have to be, other places, which is sometimes hard for me. And
[00:41:00] Unknown:
1 of the pieces that you're mentioning before too about the content being 1 of the strongest drivers of revenue in a particular case, I'm wondering how the overall evolution of the marketing landscape and different types of media or content distribution, how that impacts your overall approach to building out your platform as well as some of the ways that you're approaching trying to grow your own business or what you found to be some of the most useful mediums or channels for being able to grow revenue or grow the audience? So I think that's
[00:41:37] Unknown:
very specific to to companies to companies on, like, what works well here. I think we do see a lot of companies where, I would generally say in the b to b world, most companies are trying to produce their content on their own website. Sometimes, of course, a good idea for, like, search engine optimizations and so to have content elsewhere. But, like, generally, that is, I think, the consensus right now. I will also say right now, I think the consensus is that you wanna own your own own stuff. So this, like, getting the content out on a lot of other platforms is not something we see crazy, a lot amount of companies doing. So so far, it's not really impacting us, but it's very different.
Like, some companies are doing a lot of content, and they are bidding, relatively large amounts that they can write content that will drive traffic and and awareness. Other companies are not doing that at all, and they don't even believe that that's a possibility for them. So they come on all times. But I can talk about, like, ourselves. So, like, we're doing ads as probably most companies are doing. When we do Google Ads, we get relevant traffic in, but not super relevant. When we do Facebook traffic, I would say we are more or less like, we are targeting some of the customers, like, retargeting people, and that works. But we we are, like, looking at at, like, acquiring new customers.
I don't think Facebook works particularly well for us. We should like, now we're still early, and so we had doesn't have we don't have that much data to kind of, conclude yet. But if I had to conclude right now, I would definitely close it down. But what works well for us is Capterra, which is a review platform for, for software. That works well for us. We see a lot of such companies also use g2crowd or g2, I think they're just called now, which the data we have seems also that that it works quite well. So if you're in that space, I would definitely look towards that if you can do something. Yeah. And then I I think also podcasts and webinars has worked pretty well for us to get some awareness.
For us, it also worked really well with content. We managed to produce, some content that actually drew, relatively a large amount of traffic for us, at least.
[00:43:57] Unknown:
And are there any other areas of the work that you're doing at DreamData or the overall space of b to b sales and revenue tracking or any of the other challenges that you're facing in the data landscape that we didn't discuss that you'd like to cover before we close out the show? I I think we didn't talk a lot about machine learning and, like, those kind of things. And to be honest, we don't do a lot about that right now. We'll start investing that pretty heavily.
[00:44:23] Unknown:
And so, we look, a lot at, like, stuff like Marco chain and see if we can use that for attribution algorithms and so on, which is very interesting. I think always when kind of doing this, I'm always concerned about, like, the amount of data that's required to actually get some valid out of it. But I do,
[00:44:42] Unknown:
like, look forward to we can actually, get some time to, play with that. Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I would just like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. I think, actually, it is actually
[00:45:05] Unknown:
oh, it's really annoying to have to say this. But, I think, unfortunately, it's not necessarily what Green Data does, but it's part of it at least. I lack a place where I can get all my data in and actually have it available so that both me and my analyst team and so on can actually crew the data and get something out of it without having to ask a lot of people for help. I think everyone that works in large organization, they've felt that pain that to get data over here, they have to go to some person and get it out. Well structured data lakes, I think, is,
[00:45:41] Unknown:
that's what I require. And I'm arguing that we build that at GreenData, at least for the commercial and revenue of racing side of things. But in general, I would like that. Alright. Well, thank you very much for taking the time today to join me and discuss the work that you're doing at DreamData. It's definitely very interesting business and an interesting problem domain that you're working in. So I'm excited to see where it goes for you. So thank you again for your time, and I hope you enjoy the rest of your day. Thank you for having me. People listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com to learn about the Python language, its community, and the innovative ways that is being used.
And visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave review on iTunes and tell your friends and coworkers.
Introduction and Career Advice for Data Engineers
Interview with Oleg Dalerup: Introduction to DreamData
Building DreamData: Challenges and Solutions
Motivations and Challenges in Data Integration
Breaking Down Silos in B2B Companies
Ensuring Data Quality and Collection
Data Integration and Cleaning
Building Features and Product Roadmap
Creating Actionable Reports and Visualizations
Insights and Lessons Learned
Marketing Strategies and Content Impact
Future Directions and Machine Learning
Biggest Gaps in Data Management Tooling
Closing Remarks and Contact Information