Pipeline ejecutado correctamente

Proyecto dbt en acción:
Del CSV al Data Mart

Showcase interactivo de un proyecto dbt real con DuckDB. Explora el lineage graph, inspecciona cada modelo SQL y visualiza los resultados de las transformaciones sobre un dataset de ventas ficticio.

6
Modelos dbt
50
Pedidos
21
Tests PASS
1.02s
dbt run

1Lineage Graph interactivo

El DAG muestra cómo fluyen los datos desde los seeds CSV hasta los data marts finales. Haz clic en cualquier nodo para ver sus detalles y dependencias.

Seeds (raw) Staging (view) Intermediate (view) Marts (table)




raw_customers 10 rows · seed raw_orders 50 rows · seed raw_products 12 rows · seed

stg_customers view · staging stg_orders view · staging stg_products view · staging

int_orders_enriched view · intermediate

fct_orders table · marts dim_customers table · marts

fct_orders

Tipo: table Schema: marts Dependencias: 1 Filas: 50
Tabla de hechos con todos los pedidos y sus métricas financieras. Fuente principal para análisis de ventas.

2Modelos SQL

Los 6 modelos dbt organizados en 3 capas siguiendo las mejores prácticas. Cambia de pestaña para inspeccionar cada transformación.

models/staging/stg_orders.sql
with source as (
    select * from {{ ref('raw_orders') }}
),

renamed as (
    select
        order_id,
        customer_id,
        product_id,
        cast(order_date as date) as order_date,
        quantity,
        discount_pct / 100.0 as discount_rate,
        lower(status) as status,
        lower(channel) as channel,
        current_timestamp as _loaded_at
    from source
)

select * from renamed
models/staging/stg_customers.sql
with source as (
    select * from {{ ref('raw_customers') }}
),

renamed as (
    select
        customer_id,
        first_name,
        last_name,
        first_name || ' ' || last_name as full_name,
        lower(email) as email,
        country,
        city,
        cast(signup_date as date) as signup_date,
        upper(segment) as segment,
        current_timestamp as _loaded_at
    from source
)

select * from renamed
models/staging/stg_products.sql
with source as (
    select * from {{ ref('raw_products') }}
),

renamed as (
    select
        product_id,
        product_name,
        category,
        subcategory,
        price,
        cost,
        price - cost as gross_margin,
        round((price - cost) / price * 100, 2) as margin_pct,
        supplier,
        current_timestamp as _loaded_at
    from source
)

select * from renamed
models/intermediate/int_orders_enriched.sql
with orders as (
    select * from {{ ref('stg_orders') }}
),

customers as (
    select * from {{ ref('stg_customers') }}
),

products as (
    select * from {{ ref('stg_products') }}
),

enriched as (
    select
        o.order_id,
        o.order_date,
        date_trunc('month', o.order_date) as order_month,
        date_trunc('quarter', o.order_date) as order_quarter,
        extract(year from o.order_date) as order_year,

        -- Customer info
        o.customer_id,
        c.full_name as customer_name,
        c.segment as customer_segment,
        c.country as customer_country,
        c.city as customer_city,

        -- Product info
        o.product_id,
        p.product_name,
        p.category as product_category,
        p.subcategory as product_subcategory,

        -- Order details
        o.quantity,
        o.discount_rate,
        o.status,
        o.channel,

        -- Financial calculations
        p.price as unit_price,
        p.cost as unit_cost,
        p.price * o.quantity as gross_revenue,
        p.price * o.quantity * (1 - o.discount_rate) as net_revenue,
        p.cost * o.quantity as total_cost,
        p.price * o.quantity * (1 - o.discount_rate) - p.cost * o.quantity as gross_profit,

        -- Margin
        case
            when p.price * o.quantity * (1 - o.discount_rate) > 0
            then round(
                (p.price * o.quantity * (1 - o.discount_rate) - p.cost * o.quantity)
                / (p.price * o.quantity * (1 - o.discount_rate)) * 100, 2
            )
            else 0
        end as margin_pct

    from orders o
    left join customers c on o.customer_id = c.customer_id
    left join products p on o.product_id = p.product_id
)

select * from enriched
models/marts/fct_orders.sql
with enriched_orders as (
    select * from {{ ref('int_orders_enriched') }}
),

final as (
    select
        order_id,
        order_date,
        order_month,
        order_quarter,
        order_year,

        customer_id,
        customer_name,
        customer_segment,
        customer_country,

        product_id,
        product_name,
        product_category,

        quantity,
        discount_rate,
        status,
        channel,

        unit_price,
        unit_cost,
        gross_revenue,
        net_revenue,
        total_cost,
        gross_profit,
        margin_pct,

        -- Flags
        case when status = 'completed' then true else false end as is_completed,
        case when discount_rate > 0 then true else false end as has_discount,
        case when channel = 'online' then true else false end as is_online

    from enriched_orders
)

select * from final
models/marts/dim_customers.sql
with customers as (
    select * from {{ ref('stg_customers') }}
),

orders as (
    select
        customer_id,
        count(*) as total_orders,
        count(case when status = 'completed' then 1 end) as completed_orders,
        sum(case when status = 'completed' then net_revenue else 0 end) as total_revenue,
        sum(case when status = 'completed' then gross_profit else 0 end) as total_profit,
        min(order_date) as first_order_date,
        max(order_date) as last_order_date,
        avg(case when status = 'completed' then net_revenue end) as avg_order_value
    from {{ ref('int_orders_enriched') }}
    group by 1
),

final as (
    select
        c.customer_id,
        c.full_name,
        c.email,
        c.country,
        c.city,
        c.segment,
        c.signup_date,

        coalesce(o.total_orders, 0) as total_orders,
        coalesce(o.completed_orders, 0) as completed_orders,
        coalesce(o.total_revenue, 0) as lifetime_revenue,
        coalesce(o.total_profit, 0) as lifetime_profit,
        o.first_order_date,
        o.last_order_date,
        coalesce(o.avg_order_value, 0) as avg_order_value,

        case
            when coalesce(o.total_revenue, 0) >= 2000 then 'VIP'
            when coalesce(o.total_revenue, 0) >= 1000 then 'High Value'
            when coalesce(o.total_revenue, 0) >= 500 then 'Medium Value'
            else 'Low Value'
        end as value_tier

    from customers c
    left join orders o on c.customer_id = o.customer_id
)

select * from final

3Resultados de las queries

Datos reales extraídos de DuckDB tras ejecutar dbt run. Explora las tablas finales del data mart.

order_id order_date customer_name product_name category qty net_revenue profit margin status
1001 2023-01-05 Ana García Laptop Pro 15 Electrónica 1 1299.99 519.99 40.0% completed
1002 2023-01-08 Carlos López Mouse Inalámbrico Electrónica 2 56.98 32.98 57.9% completed
1003 2023-01-12 María Martínez Monitor 27 4K Electrónica 1 499.99 219.99 44.0% completed
1004 2023-01-15 Ana García Teclado Mecánico Electrónica 1 80.99 35.99 44.4% completed
1005 2023-01-20 Juan Rodríguez Silla Ergonómica Mobiliario 1 399.99 219.99 55.0% completed
1006 2023-02-02 Laura González Laptop Pro 15 Electrónica 1 1104.99 324.99 29.4% completed
1007 2023-02-10 Pedro Sánchez Webcam HD Electrónica 1 79.99 44.99 56.2% completed
1008 2023-02-14 Sofía Fernández Mouse Inalámbrico Electrónica 3 85.47 49.47 57.9% completed
1009 2023-02-18 Miguel Torres Auriculares BT Electrónica 1 149.99 79.99 53.3% completed
1010 2023-02-25 Carmen Díaz Hub USB-C Electrónica 2 99.98 59.98 60.0% completed
1011 2023-03-01 Roberto Ruiz Escritorio Standing Mobiliario 1 629.99 279.99 44.4% completed
1012 2023-03-05 Ana García Cable HDMI 4K Electrónica 5 99.95 74.95 75.0% completed
1013 2023-03-10 Carlos López Alfombrilla XL Accesorios 2 49.98 33.98 68.0% completed
1014 2023-03-15 María Martínez Laptop Pro 15 Electrónica 1 1039.99 259.99 25.0% completed
1015 2023-03-20 Juan Rodríguez Monitor 27 4K Electrónica 1 474.99 194.99 41.1% completed
# Cliente País Segmento Pedidos Lifetime Revenue Lifetime Profit Avg Order Tier
5 Laura González Argentina GOLD 5 2619.94 963.94 523.99 VIP
1 Ana García España PREMIUM 6 2142.87 940.87 428.57 VIP
4 Juan Rodríguez España STANDARD 5 2142.43 772.43 428.49 VIP
8 Miguel Torres Colombia PREMIUM 5 1904.83 890.83 380.97 High Value
3 María Martínez México PREMIUM 5 1799.91 644.91 359.98 High Value
6 Pedro Sánchez España STANDARD 5 1671.16 742.16 334.23 High Value
7 Sofía Fernández México GOLD 5 1660.93 769.93 332.19 High Value
10 Roberto Ruiz Chile GOLD 4 1489.94 695.94 372.49 High Value
2 Carlos López España STANDARD 5 1116.93 546.93 223.39 High Value
9 Carmen Díaz España STANDARD 5 578.93 285.93 144.73 Medium Value

Ingresos netos por mes (EUR)

 

Beneficio bruto por mes (EUR)

 

Ingresos por categoría de producto

 

Margen medio por categoría (%)

 

Ingresos netos por país

 

4Data Quality Tests

dbt ejecuta automáticamente tests de calidad sobre los modelos definidos en los schemas YAML. Todos los tests pasaron correctamente.

21
Tests totales
21
PASS
0
WARN
0
ERROR
unique_stg_orders_order_id
0.08s
not_null_stg_orders_order_id
0.06s
not_null_stg_orders_customer_id
0.05s
not_null_stg_orders_product_id
0.05s
accepted_values_stg_orders_status
0.07s
unique_stg_customers_customer_id
0.05s
not_null_stg_customers_customer_id
0.04s
unique_stg_customers_email
0.05s
not_null_stg_customers_email
0.04s
accepted_values_stg_customers_segment
0.06s
unique_stg_products_product_id
0.04s
not_null_stg_products_product_id
0.04s
unique_int_orders_enriched_order_id
0.05s
not_null_int_orders_enriched_order_id
0.04s
unique_fct_orders_order_id
0.05s
not_null_fct_orders_order_id
0.04s
not_null_fct_orders_net_revenue
0.05s
unique_dim_customers_customer_id
0.04s
not_null_dim_customers_customer_id
0.04s
not_null_dim_customers_value_tier
0.04s
accepted_values_dim_customers_value_tier
0.05s

¿Quieres replicar este proyecto?

El código completo está disponible con Dockerfile, seeds y documentación lista para arrancar en minutos.

Leer el tutorial completo en Dataprix