确实,架构发生了变化Google Analytics for Firebase BigQuery 导出。虽然旧字段与新字段之间没有明确的映射,但文档中提供的 SQL 查询是为了将现有 BQ 数据集从旧模式迁移到新模式提供了一些关于这些字段如何变化的提示。
我分享的是migration_script.sql
下面的 SQL 查询仅供参考,但让我指出与您的用例最相关的更改:
-
事件变暗被映射为event在 SQL 查询中,但在模式中没有任何最终表示,因为事件变暗不再是嵌套字段:
UNNEST(event_dim) AS event
-
event_dim.timestamp_micros被映射为事件时间戳:
event.timestamp_micros AS event_timestamp
-
event_dim.名称被映射为事件名称:
event.name AS event_name
-
event_param.value.int_value被映射为event_params.value.int_value:
event_param.value.int_value AS int_value
-
user_dim.user_properties被映射为用户属性,并且它的所有嵌套值都遵循相同的结构:
UNNEST(user_dim.user_properties) AS user_property) AS user_properties
因此,总而言之,为了简单起见,模式更改的重点是取消嵌套几个字段,例如,不必访问event_dim.name
(这需要取消嵌套并使查询复杂化),您可以直接查询字段event_name
.
考虑到这一点,我相信您将能够使您的查询适应这个新模式,并且它可能看起来更简单,因为您不必取消嵌套这么多字段。
为了澄清起见,让我与您分享几个比较新旧架构的示例 BQ 查询(它们使用公共 Firebase 表,因此您应该能够开箱即用地运行它们):
# Old Schema - UNNEST() required because there are nested fields
SELECT
user_dim.app_info.app_instance_id,
MIN(event.timestamp_micros) AS min_time,
MAX(event.timestamp_micros) AS max_time,
event.name
FROM
`firebase-public-project.com_firebase_demo_ANDROID.app_events_20180503`,
UNNEST(event_dim) AS event
WHERE
event.name = "user_engagement"
GROUP BY
user_dim.app_info.app_instance_id,
event.name
相比于:
# New Schema - UNNEST() not required because there are no nested fields
SELECT
user_pseudo_id,
MIN(event_timestamp) AS min_time,
MAX(event_timestamp) AS max_time,
event_name
FROM
`firebase-public-project.analytics_153293282.events_20180815`
WHERE
event_name = "user_engagement"
GROUP BY
user_pseudo_id,
event_name
这些查询是等效的,但引用具有旧模式和新模式的表。请注意,由于您的查询更加复杂,您可能需要添加一些 UNNEST() 才能访问表中剩余的嵌套字段。
此外,您可能想看看这些样本这可以帮助您了解如何使用新模式编写查询的一些想法。
EDIT 2
我的理解是,像下面这样的查询应该允许您查询单个语句中的所有字段。我按所有非聚合/过滤字段进行分组,但根据您的用例(这绝对是您需要自己处理的事情),您可能需要应用不同的策略以便能够查询非聚合/过滤字段-分组字段(即使用 MIN/MAX 过滤器等)。
SELECT
user_pseudo_id,
MIN(event_timestamp) AS min_time,
MAX(event_timestamp) AS max_time,
event_name,
par.value.int_value AS engagement_time
FROM
`firebase-public-project.analytics_153293282.events_20180815`,
UNNEST(event_params) as par
WHERE
event_name = "user_engagement" AND par.key = "engagement_time_msec"
GROUP BY
user_pseudo_id,
event_name,
par.value.int_value
ANNEX
migration_script.sql
:
SELECT
@date AS event_date,
event.timestamp_micros AS event_timestamp,
event.previous_timestamp_micros AS event_previous_timestamp,
event.name AS event_name,
event.value_in_usd AS event_value_in_usd,
user_dim.bundle_info.bundle_sequence_id AS event_bundle_sequence_id,
user_dim.bundle_info.server_timestamp_offset_micros as event_server_timestamp_offset,
(
SELECT
ARRAY_AGG(STRUCT(event_param.key AS key,
STRUCT(event_param.value.string_value AS string_value,
event_param.value.int_value AS int_value,
event_param.value.double_value AS double_value,
event_param.value.float_value AS float_value) AS value))
FROM
UNNEST(event.params) AS event_param) AS event_params,
user_dim.first_open_timestamp_micros AS user_first_touch_timestamp,
user_dim.user_id AS user_id,
user_dim.app_info.app_instance_id AS user_pseudo_id,
"" AS stream_id,
user_dim.app_info.app_platform AS platform,
STRUCT( user_dim.ltv_info.revenue AS revenue,
user_dim.ltv_info.currency AS currency ) AS user_ltv,
STRUCT( user_dim.traffic_source.user_acquired_campaign AS name,
user_dim.traffic_source.user_acquired_medium AS medium,
user_dim.traffic_source.user_acquired_source AS source ) AS traffic_source,
STRUCT( user_dim.geo_info.continent AS continent,
user_dim.geo_info.country AS country,
user_dim.geo_info.region AS region,
user_dim.geo_info.city AS city ) AS geo,
STRUCT( user_dim.device_info.device_category AS category,
user_dim.device_info.mobile_brand_name,
user_dim.device_info.mobile_model_name,
user_dim.device_info.mobile_marketing_name,
user_dim.device_info.device_model AS mobile_os_hardware_model,
@platform AS operating_system,
user_dim.device_info.platform_version AS operating_system_version,
user_dim.device_info.device_id AS vendor_id,
user_dim.device_info.resettable_device_id AS advertising_id,
user_dim.device_info.user_default_language AS language,
user_dim.device_info.device_time_zone_offset_seconds AS time_zone_offset_seconds,
IF(user_dim.device_info.limited_ad_tracking, "Yes", "No") AS is_limited_ad_tracking ) AS device,
STRUCT( user_dim.app_info.app_id AS id,
@firebase_app_id AS firebase_app_id,
user_dim.app_info.app_version AS version,
user_dim.app_info.app_store AS install_source ) AS app_info,
(
SELECT
ARRAY_AGG(STRUCT(user_property.key AS key,
STRUCT(user_property.value.value.string_value AS string_value,
user_property.value.value.int_value AS int_value,
user_property.value.value.double_value AS double_value,
user_property.value.value.float_value AS float_value,
user_property.value.set_timestamp_usec AS set_timestamp_micros ) AS value))
FROM
UNNEST(user_dim.user_properties) AS user_property) AS user_properties
FROM
`SCRIPT_GENERATED_TABLE_NAME`,
UNNEST(event_dim) AS event