What is it?
DCM DT (Data Transfer) V2.0 is a reporting service provided by DoubleClick that can deliver raw reporting data in cloud databases which allows the user to create analytics insights beyond a standard DoubleClick report.
How does it work?
The DT V2.0 service itself delivers files in GCS (Google Cloud Storage). However, each file is only stored in GCS for 60 days. Therefore, for the purposes of storing and using these files easily, GCP (Google Cloud Platform) also provides BigQuery Data Transfer Service in order to help users who want to import files from GCS to BigQuery automatically. If users would like to use an alternative, i.e. Apache Spark, then it is necessary to permanently store files in GCS. However, our preferred approach at StackPros is to move files from the original GCS bucket to another bucket on a daily basis.
In terms of delivery, DT V2.0 delivers 4 types of files into GCS instance:
- Impressions (1 per hour)
- Clicks (1 per hour)
- Activities (1 per day)
- Match table files
Compared to the data from DCM report builder, the data that comes from DT V2.0 provides impression, click, rich media and floodlight activity information in a log level format as well as in a User ID (Cookie ID) level format, which allows users to create advanced path-to-conversion reports and data science models based on this. Meanwhile, DBM related fields are also available in DT V2.0 files.
Integration and Cohesion
Before using the data for analytics purposes, it is necessary to think about how we want to join impression, click, activity and match files together. In order to do so, if you have data stored in BigQuery already and are using Python or another computing language, it is entirely possible to have your computer instance connected with the Big Query API, followed by inserting your SQL queries into your code and scheduling tasks based on your time preferences.
If you need to create an ad hoc report by using DCM DT data, however, you still won’t have a table which consolidates all four types of files into one, however below I’ve provided some useful queries for you to try:
DCM DT V2.0 data provides User ID column which allows us to create a path-to-conversion like report. However, since each user has specific privacy issues and settings within various browsers, not all users will provide a User ID (Cookie ID) returning back to the tracking system. Therefore, whenever we are performing analytics at the user level, it is better to know what percent of DCM DT data has redacted User ID’s before beginning.
Problem to Solve:
What percentage of DT data does not have a Cookie ID.
SQL Query (Standard SQL):
WITH consolidation_1 AS (
SELECT User_ID,"Impression" AS File_Type
FROM DATASET_NAME.IMPRESSION_TABLE UNION ALL
SELECT User_ID,"Click" AS File_Type
FROM DATASET_NAME.CLICK_TABLE UNION ALL
SELECT User_ID,"Activity" AS File_Type
consolidation_2 AS (
IF(User_ID="0" ,1,0) AS Redacted, IF(User_ID!="0",1,0) AS Filled
SELECT File_Type,SUM(Redacted) AS Redacted,SUM(Filled) AS Filled,COUNT(*) AS Total,
SUM(Redacted)/COUNT(*) AS Percent_Redacted
FROM consolidation_2 GROUP BY File_Type
Based on the idea of creating a Reach Report from DCM report builder, DCM DT also allows us to do the same by using SQL queries.
Problem to solve:
How many users have been reached through a specific group of Campaigns, Placements etc.
SQL Query (Standard SQL):
WHEN Campaign_ID IN ("xxxxxx","xxxxxx") THEN "Tactics_1"
WHEN Campaign_ID IN ("xxxxxx","xxxxxx")THEN "Tactics_2"
ELSE "Tactic_3" END AS Campaign_Group,
COUNT(DISTINCT User_ID) AS unique_user,
COUNT(User_ID) AS impressions
WHERE User_ID != '0'
BETWEEN "2017-08-01"AND "2017-08-31"
GROUP BY Campaign_Group
ORDER BY unique_user DESC
To summarize, DCM DT Data provides a high amount of convenience to us if it can be used in conjunction with the GCP service (compute instance, databases etc.). Since it is a User ID based dataset, our team was able to create several machine learning models to support campaign optimization. In the future, we believe DCM DT will be capable enough to let us connect with other data sources such GA360, Salesforce, AWS and more with even better analytics insights & visualizations.