sql >> Database teknologi >  >> RDS >> PostgreSQL

Optimer langsomme aggregater i LATERAL join

Dette burde være en hurtigere variant med LATERAL underforespørgsler. Uafprøvet.

SELECT s.record_id, s.security_id, s.date
     , s.price / l.pmax   AS price_to_peak_earnings
     , s.price / l.pmin   AS price_to_minimum_earnings
  -- , ...
     , s.price / l.cape1  AS cape1
     , s.price / l.cape2  AS cape2
  -- , ...
     , s.price / l.cape10 AS cape10
     , s.price / l.capb1  AS capb1
     , s.price / l.capb2  AS capb2
  -- , ...
     , s.price / l.capb10 AS capb10
  -- , ...
FROM  (
   SELECT *
        , (date - interval  '1 y')::date AS date1
        , (date - interval  '2 y')::date AS date2
        -- ...
        , (date - interval '10 y')::date AS date10
   FROM  (
      SELECT *, min(date) OVER (PARTITION BY security_id) AS min_date
      FROM   security_data
      ) s1
   ) s
LEFT   JOIN LATERAL (
   SELECT CASE WHEN s.date10 >= s.min_date THEN NULLIF(max(earnings)                               , 0) END AS pmax
        , CASE WHEN s.date10 >= s.min_date THEN NULLIF(min(earnings)                               , 0) END AS pmin
        -- ...
        ,                                       NULLIF(avg(earnings) FILTER (WHERE date >= s.date1), 0)     AS cape1   -- no case
        , CASE WHEN s.date2  >= s.min_date THEN NULLIF(avg(earnings) FILTER (WHERE date >= s.date2), 0) END AS cape2
        -- ...
        , CASE WHEN s.date10 >= s.min_date THEN NULLIF(avg(earnings)                               , 0) END AS cape10  -- no filter

        ,                                       NULLIF(avg(book)     FILTER (WHERE date >= s.date1), 0)     AS capb1
        , CASE WHEN s.date2  >= s.min_date THEN NULLIF(avg(book)     FILTER (WHERE date >= s.date2), 0) END AS capb2
        -- ...
        , CASE WHEN s.date10 >= s.min_date THEN NULLIF(avg(book)                                   , 0) END AS capb10
        -- ...
   FROM   security_data 
   WHERE  security_id = s.security_id
   AND    date >= s.date10
   AND    date <  s.date
   ) l ON s.date1 >= s.min_date  -- no computations if < 1 year of trailing data
ORDER  BY s.security_id, s.date;

Det kommer stadig ikke til at være lynende hurtigt, da hver række har brug for flere separate sammenlægninger. Flaskehalsen her vil være CPU.

Se også opfølgningen med en alternativ tilgang (JOIN til genererede kalender + vinduesfunktioner):




  1. ORA-00060:dødvande registreret, mens man venter på ressource

  2. PostgreSQL:indsæt fra en anden tabel

  3. Advarsel:mysqli_query() forventer, at parameter 1 er mysqli, ressource givet

  4. Slet dubletter fra stort datasæt (>100 mio. rækker)