Oracle
 sql >> Datenbank >  >> RDS >> Oracle

Normalisieren Sie Transaktionsdaten aus Zeit- und Statusspalten auf Minuten pro Statuswert

Eine Lösung für diese Art von Abfrage umfasst zwei Teile:die Kategoriegenerierung, gefolgt von der Aggregation in die generierten Kategorien.

Für die von Ihnen bereitgestellten Daten besteht der erste Schritt bei dieser Art von Lösung darin, die Daten nach Stunden zu bündeln (da die von Ihnen bereitgestellten Daten keine Ereignisse in der 02:00-Uhr oder 04:00-Stunde enthalten, um diese Stunden anzuzeigen im Endergebnis können sie generiert werden stattdessen).

Der zweite Teil besteht darin, über einen pivot in die stündlichen Buckets zu aggregieren , wie von Jorge Campos in den Kommentaren erwähnt.

Unten ist ein Beispiel.

Erstellen Sie zuerst eine Testtabelle:

CREATE TABLE INSERT_TIME_STATUS(
  INSERT_TIME TIMESTAMP,
  STATUS VARCHAR2(128)
);

Und fügen Sie die Testdaten hinzu:

INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 00:00:00', 'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 00:15:00', 'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 00:30:00', 'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 01:30:00', 'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 03:10:00', 'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 05:00:00', 'NOT AVAILABLE');

Erstellen Sie dann die Abfrage. Dabei wird Unterabfrage-Factoring verwendet, um die zweistufige Natur dieses Prozesses zu skizzieren.

Der CALENDAR Unterfaktor hier generiert jede Stunde des Tages, unabhängig davon, ob während dieser Stunde irgendwelche Aufzeichnungen aufgetreten sind.

Der HOUR_CALENDAR subfactor ordnet jeden bereitgestellten Statusdatensatz einer bestimmten Stunde zu und zerlegt Statusdaten, die in eine andere Stunde übergehen, in Stücke, sodass alle Datensätze in eine Zeitspanne von einer Stunde passen.

Der DURATION_IN_STATUS Unterfaktor zählt, wie viele Minuten jeder Status während jeder Stunde aktiv war.

Die letzte Abfrage wird PIVOT zu aggregieren (SUM ) die Dauer jedes STATUS war jede Stunde aktiv.

WITH HOUR_OF_DAY AS (SELECT LEVEL - 1 AS THE_HOUR
                     FROM DUAL
                     CONNECT BY LEVEL < 25),
    CALENDAR AS (SELECT DAY_START
                 FROM (
                   SELECT (TIMESTAMP '2017-01-01 00:00:00' + NUMTODSINTERVAL(DATE_INCREMENT.OFFSET, 'DAY')) AS DAY_START
                   FROM (SELECT LEVEL - 1 AS OFFSET
                         FROM DUAL
                         CONNECT BY LEVEL < 9999) DATE_INCREMENT)
                 WHERE DAY_START BETWEEN (SELECT MIN(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                                          FROM INSERT_TIME_STATUS)
                 AND (SELECT MAX(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                      FROM INSERT_TIME_STATUS)),
    HOUR_CALENDAR AS (
     SELECT
       TO_CHAR(CALENDAR.DAY_START, 'MM/DD/YYYY')                                               AS THE_DAY,
       HOUR_OF_DAY.THE_HOUR,
       CALENDAR.DAY_START + NUMTODSINTERVAL(HOUR_OF_DAY.THE_HOUR, 'HOUR')                      AS HOUR_START,
       (SELECT MAX(INSERT_TIME_STATUS.STATUS)
       KEEP (DENSE_RANK LAST
         ORDER BY INSERT_TIME_STATUS.INSERT_TIME ASC)
        FROM INSERT_TIME_STATUS
        WHERE INSERT_TIME_STATUS.INSERT_TIME <= DAY_START + NUMTODSINTERVAL(THE_HOUR, 'HOUR')) AS HOUR_START_STATUS
     FROM CALENDAR
       CROSS JOIN HOUR_OF_DAY),
    ALL_HOUR_STATUS AS (
    SELECT
      HOUR_CALENDAR.THE_DAY,
      HOUR_CALENDAR.THE_HOUR,
      HOUR_CALENDAR.HOUR_START        AS THE_TIME,
      HOUR_CALENDAR.HOUR_START_STATUS AS THE_STATUS
    FROM HOUR_CALENDAR
    UNION ALL
    SELECT
      HOUR_CALENDAR.THE_DAY,
      HOUR_CALENDAR.THE_HOUR,
      INSERT_TIME_STATUS.INSERT_TIME AS THE_TIME,
      INSERT_TIME_STATUS.STATUS      AS THE_STATUS
    FROM HOUR_CALENDAR
      INNER JOIN INSERT_TIME_STATUS
        ON HOUR_CALENDAR.HOUR_START < INSERT_TIME_STATUS.INSERT_TIME
           AND HOUR_CALENDAR.THE_HOUR = EXTRACT(HOUR FROM INSERT_TIME_STATUS.INSERT_TIME)),
    DURATION_IN_STATUS AS (
     SELECT
       ALL_HOUR_STATUS.THE_DAY,
       ALL_HOUR_STATUS.THE_HOUR,
       ALL_HOUR_STATUS.THE_STATUS,
       (EXTRACT(HOUR FROM
                (COALESCE(LEAD(THE_TIME)
                          OVER (
                            PARTITION BY NULL
                            ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME)) * 60)
       +
       EXTRACT(MINUTE FROM
               (COALESCE(LEAD(THE_TIME)
                         OVER (
                           PARTITION BY NULL
                           ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME))
         AS DURATION_IN_STATUS
     FROM ALL_HOUR_STATUS)
SELECT
  THE_DAY,
  THE_HOUR,
  COALESCE(AVAILABLE, 0)     AS AVAILABLE,
  COALESCE(NOT_AVAILABLE, 0) AS NOT_AVAILABLE,
  COALESCE(BUSY, 0)          AS BUSY
FROM DURATION_IN_STATUS
PIVOT (SUM(DURATION_IN_STATUS)
  FOR THE_STATUS
  IN ('AVAILABLE' AS AVAILABLE, 'NOT AVAILABLE' AS NOT_AVAILABLE, 'BUSY' AS BUSY)
)
ORDER BY THE_DAY ASC, THE_HOUR ASC;

Ergebnis:

THE_DAY     THE_HOUR  AVAILABLE  NOT_AVAILABLE  BUSY  
01/01/2017  0         15         30             15    
01/01/2017  1         30         30             0     
01/01/2017  2         60         0              0     
01/01/2017  3         10         0              50    
01/01/2017  4         0          0              60    
01/01/2017  5         0          60             0     
01/01/2017  6         0          60             0     
01/01/2017  7         0          60             0     
01/01/2017  8         0          60             0     
01/01/2017  9         0          60             0     
01/01/2017  10        0          60             0     
01/01/2017  11        0          60             0     
01/01/2017  12        0          60             0     
01/01/2017  13        0          60             0     
01/01/2017  14        0          60             0     
01/01/2017  15        0          60             0     
01/01/2017  16        0          60             0     
01/01/2017  17        0          60             0     
01/01/2017  18        0          60             0     
01/01/2017  19        0          60             0     
01/01/2017  20        0          60             0     
01/01/2017  21        0          60             0     
01/01/2017  22        0          60             0     
01/01/2017  23        0          60             0     


24 rows selected. 

Diese Beispielabfrage generiert Datensätze für den ganzen Tag. Also der letzte Status von NOT AVAILABLE führt durch. Wenn Sie zum Zeitpunkt des zuletzt zugewiesenen Status stoppen möchten, könnte dieses Verhalten nach Bedarf angepasst werden.

EDIT, als Antwort auf Ihr Update, diese Zeiten pro channel_id auszuwerten und user_id , hier ist ein weiteres Beispiel:

Erstellen Sie zuerst die Testtabelle:

CREATE TABLE INSERT_TIME_STATUS(
  USER_ID NUMBER,
  CHANNEL_ID NUMBER,
  INSERT_TIME TIMESTAMP,
  STATUS VARCHAR2(128)
);

Und laden Sie es (hier ist user_id=1 auf den Kanälen 3 und 4 und user_id=2 ist nur auf Kanal 3):

INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 0:00','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 0:15','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 0:30','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 1:30','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 3:10','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 0:00','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 0:15','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 0:30','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 1:30','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 3:10','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 0:00','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 0:15','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 0:30','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 1:30','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 3:10','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');

Aktualisieren Sie dann die Abfrage, um Daten pro user_id zu generieren per-channel_id . In diesem Beispiel sind Daten für alle Zeiten enthalten, für alle Kanäle, an denen jeder Benutzer beteiligt ist. Benutzer 1 hat Zählungen für jede Stunde des Tages für die Kanäle 3 und 4 während Benutzer-2 nur für Kanal 3 Zählungen für jede Stunde des Tages hat (wenn er Aufzeichnungen auf einem anderen Kanal hatte, wird dieser Kanal ebenfalls eingeschlossen).

WITH HOUR_OF_DAY AS (SELECT LEVEL - 1 AS THE_HOUR
                     FROM DUAL
                     CONNECT BY LEVEL < 25),
    CALENDAR AS (SELECT DAY_START
                 FROM (
                   SELECT ((SELECT MIN(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                            FROM INSERT_TIME_STATUS) + NUMTODSINTERVAL(DATE_INCREMENT.OFFSET, 'DAY')) AS DAY_START
                   FROM (SELECT LEVEL - 1 AS OFFSET
                         FROM DUAL
                         CONNECT BY LEVEL < 9999) DATE_INCREMENT)
                 WHERE DAY_START BETWEEN (SELECT MIN(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                                          FROM INSERT_TIME_STATUS)
                 AND (SELECT MAX(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                      FROM INSERT_TIME_STATUS)),
    USER_CHANNEL_HOUR_CALENDAR AS (
     SELECT
       USER_ID,
       CHANNEL_ID,
       CALENDAR.DAY_START,
       TO_CHAR(CALENDAR.DAY_START, 'MM/DD/YYYY')                                               AS THE_DAY,
       HOUR_OF_DAY.THE_HOUR,
       CALENDAR.DAY_START + NUMTODSINTERVAL(HOUR_OF_DAY.THE_HOUR, 'HOUR')                      AS HOUR_START
     FROM CALENDAR
       CROSS JOIN HOUR_OF_DAY
       --
       CROSS JOIN (SELECT UNIQUE USER_ID, CHANNEL_ID FROM INSERT_TIME_STATUS)
  ),
    HOUR_CALENDAR AS (
     SELECT USER_ID,
       CHANNEL_ID,
       THE_DAY,
       THE_HOUR,
       DAY_START,
       HOUR_START,
       (SELECT MAX(INSERT_TIME_STATUS.STATUS)
       KEEP (DENSE_RANK LAST
         ORDER BY INSERT_TIME_STATUS.INSERT_TIME ASC)
        FROM INSERT_TIME_STATUS
        WHERE INSERT_TIME_STATUS.INSERT_TIME <= DAY_START + NUMTODSINTERVAL(THE_HOUR, 'HOUR')
              AND INSERT_TIME_STATUS.USER_ID = USER_ID
              AND INSERT_TIME_STATUS.CHANNEL_ID = CHANNEL_ID) AS HOUR_START_STATUS
     FROM USER_CHANNEL_HOUR_CALENDAR),
    ALL_HOUR_STATUS AS (
    SELECT
      HOUR_CALENDAR.USER_ID,
      HOUR_CALENDAR.CHANNEL_ID,
      HOUR_CALENDAR.THE_DAY,
      HOUR_CALENDAR.THE_HOUR,
      HOUR_CALENDAR.HOUR_START        AS THE_TIME,
      HOUR_CALENDAR.HOUR_START_STATUS AS THE_STATUS
    FROM HOUR_CALENDAR
    UNION ALL
    SELECT
      INSERT_TIME_STATUS.USER_ID,
      INSERT_TIME_STATUS.CHANNEL_ID,
      HOUR_CALENDAR.THE_DAY,
      HOUR_CALENDAR.THE_HOUR,
      INSERT_TIME_STATUS.INSERT_TIME AS THE_TIME,
      INSERT_TIME_STATUS.STATUS      AS THE_STATUS
    FROM HOUR_CALENDAR
      INNER JOIN INSERT_TIME_STATUS
        ON HOUR_CALENDAR.HOUR_START < INSERT_TIME_STATUS.INSERT_TIME
           AND HOUR_CALENDAR.THE_HOUR = EXTRACT(HOUR FROM INSERT_TIME_STATUS.INSERT_TIME)
           AND HOUR_CALENDAR.USER_ID = INSERT_TIME_STATUS.USER_ID
           AND HOUR_CALENDAR.CHANNEL_ID = INSERT_TIME_STATUS.CHANNEL_ID),
    DURATION_IN_STATUS AS (
     SELECT
       ALL_HOUR_STATUS.USER_ID,
       ALL_HOUR_STATUS.CHANNEL_ID,
       ALL_HOUR_STATUS.THE_DAY,
       ALL_HOUR_STATUS.THE_HOUR,
       ALL_HOUR_STATUS.THE_STATUS,
       (EXTRACT(HOUR FROM
                (COALESCE(LEAD(THE_TIME)
                          OVER (
                            PARTITION BY USER_ID, CHANNEL_ID
                            ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME)) * 60)
       +
       EXTRACT(MINUTE FROM
               (COALESCE(LEAD(THE_TIME)
                         OVER (
                           PARTITION BY USER_ID, CHANNEL_ID
                           ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME))
         AS DURATION_IN_STATUS
     FROM ALL_HOUR_STATUS)
SELECT
  USER_ID,
  CHANNEL_ID,
  THE_DAY,
  THE_HOUR,
  COALESCE(AVAILABLE, 0)     AS AVAILABLE,
  COALESCE(NOT_AVAILABLE, 0) AS NOT_AVAILABLE,
  COALESCE(BUSY, 0)          AS BUSY
FROM DURATION_IN_STATUS
PIVOT (SUM(DURATION_IN_STATUS)
  FOR THE_STATUS
  IN ('AVAILABLE' AS AVAILABLE, 'NOT AVAILABLE' AS NOT_AVAILABLE, 'BUSY' AS BUSY)
)
  -- You can additionally filter the result
  -- WHERE CHANNEL_ID IN (3,4)
  -- WHERE USER_ID = 12345
  -- WHERE THE_DAY > TO_CHAR(DATE '2017-01-01')
  -- etc.
ORDER BY USER_ID ASC, CHANNEL_ID ASC, THE_DAY ASC, THE_HOUR ASC;

Dann testen Sie es:

USER_ID  CHANNEL_ID  THE_DAY     THE_HOUR  AVAILABLE  NOT_AVAILABLE  BUSY  
1111     3           01/01/2017  0         15         30             15    
1111     3           01/01/2017  1         30         30             0     
1111     3           01/01/2017  2         60         0              0     
1111     3           01/01/2017  3         10         0              50    
1111     3           01/01/2017  4         0          0              60    
1111     3           01/01/2017  5         0          60             0     
1111     3           01/01/2017  6         0          60             0  
...
1111     3           01/01/2017  23        0          60             0     
1111     4           01/01/2017  0         15         30             15    
1111     4           01/01/2017  1         30         30             0     
1111     4           01/01/2017  2         60         0              0     
1111     4           01/01/2017  3         10         0              50    
1111     4           01/01/2017  4         0          0              60    
1111     4           01/01/2017  5         0          60             0     
1111     4           01/01/2017  6         0          60             0
...
1111     4           01/01/2017  23        0          60             0     
2222     3           01/01/2017  0         15         30             15    
2222     3           01/01/2017  1         30         30             0     
2222     3           01/01/2017  2         60         0              0     
2222     3           01/01/2017  3         10         0              50    
2222     3           01/01/2017  4         0          0              60    
2222     3           01/01/2017  5         0          60             0     
2222     3           01/01/2017  6         0          60             0