CROSS APPLY WHERE 子句在交叉应用之前或结果之后起作用吗

2024-06-11

我正在做一种我们在一个内部应用程序中需要的自定义模糊匹配算法。我正在努力加快速度。当我对模糊函数进行交叉应用以查找建议的匹配项时，我不想搜索不必要的数据。

这是函数：

select top 5 Manufacturer,ManufacturerPartNumber,Description as ManufacturerDescription, CONVERT(money,Price) as Price,fms.Score  
from Products_OurProducts_Products_View
    CROSS APPLY (
         select 
         dbo.FuzzyControlMatch(@objectModel, ManufacturerPartNumber) AS score
    ) AS fms

ORDER BY fms.Score DESC

现在假设用户发送的我的制造商零件号是 FD1234，我们实际上不需要模糊所有制造商零件号。我可以添加这样的子句以获得更精确的模糊数据集，还是会在交叉应用已经运行之后发生这种情况并且只影响最终结果。

 select top 5 Manufacturer,ManufacturerPartNumber,Description as ManufacturerDescription, CONVERT(money,Price) as Price,fms.Score  
from Products_OurProducts_Products_View
    CROSS APPLY (
         select 
         dbo.FuzzyControlMatch(@objectModel, ManufacturerPartNumber) AS score
    ) AS fms
 WHERE LEN(ManufacturerPartNUmber) < LEN(@objectModel)+5

我希望交叉仅适用于接近 @objectmodel 中参数长度的项目。

我希望交叉仅适用于模糊项目close@objectmodel 中参数的长度。

要测量接近度，请使用 ABS(a-b)。如果你想要相似（接近）长度的字符串，那就是：

ABS(LEN(string1) - LEN(string2))

如果您想要的字符串比其他字符串长/短不超过 5 个字符，则 WHERE 子句将如下所示

WHERE ABS(LEN(string1) - LEN(string2)) <= 5

我可以添加这样的子句以获得更精确的模糊数据集，还是会在交叉应用已经运行之后发生这种情况并且只影响最终结果。

SQL Server 逻辑查询处理（https://images.app.goo.gl/qTQpdg2NsC7M4eUj9 https://images.app.goo.gl/qTQpdg2NsC7M4eUj9) 指示先对 WHERE 子句求值（在 FROM 之后）。 WHERE 子句中的标量 UDF 改变了这一点。如果 dbo.FuzzyControlMatch 是 T-SQL 标量 UDF 并在 WHERE 子句中使用，那么它将首先被处理，并且您将被迫评估所有内容。但您发布的内容似乎并非如此。

这里如何提高性能？

对于初学者来说，我要做的是提前计算字符串长度。您可以使用持久计算列，然后添加索引 ON（字符串长度，无论 TOP 5 的排序依据是什么）。然后将该查询使用的其他列包含在 INCLUDE 列中。

或者，您可以使用临时表或表变量进行预过滤，然后在那里应用您的函数。

不管怎样，我有一个函数和几个算法，如果您经常处理字符串相似性，它们将改变您的生活。今晚晚些时候，当我有更多时间时，我会发布这些内容。

继续....

我不熟悉你的相似度函数，因为你没有提供任何 DDL，但我知道如何测量字符串相似度https://itnext.io/string-similarity-the-basic-know-your-algorithms-guide-3de3d7346227 https://itnext.io/string-similarity-the-basic-know-your-algorithms-guide-3de3d7346227使用 Levenshtein、Damerau-Levenshtein、最长公共子串等算法，和最长公共子序列（LCSQ）。编辑编辑可以在 O(mn) 时间。达默劳-莱文斯坦 in O(mn*(Max(m,n)) 次。 O(n 中的最长公共子串m) 或 O(n)+O(nk) 具有广义后缀树。两个字符串之间的最长公共子序列是 NP-Hard。

Levenshtein 和 Damerau-Levenshtein 是距离度量https://en.wikipedia.org/wiki/String_metric https://en.wikipedia.org/wiki/String_metric并且可以像这样测量相似度：取两个字符串S1 and S2, S1总是小于或等于S2; L1作为 S1 的长度，L2S2 和 E 的长度作为编辑距离... (S1,S2) 之间的相似度公式为：(L1-E)/L2。考虑一下“他们的”和“梯也尔”这两个词。两个字符串都是 6 个字符长，编辑距离为 2。 (6-2)/6 = .67；根据 Levenshtein 的说法，这些人有 67% 相似。这些字符串之间的 Damerau-Levenshtein 距离为 1； (6-1)/6 = .83；根据 DLD，这些字符串的相似度得分为 83%。

具有最长的公共子串或子序列（两者的长度为LS）相似度LS/L2。例如。 ‘ABC123’和“ABC12.3”之间的最长公共子串是“ABC12”； LS=5，L2=7，5/7 = 71% 相似。两者之间最长的公共子序列是“ABC123”。 LS=6，选择 6./7 = 86% 相似度。

介绍伯尼距离;很简单：当 S1 是 S2 的子串时，伯尼距离 (BD) 为 L2-L1，相似度可以通过 L1/L2 来测量。例如：BD(“xxx”,“yyy”)返回NULL； BD(“袋鼠”，“袋鼠”) = 1...L2=9，L1=8，L2-L1=1。在这里，Bernie 给我们的距离为 1，相似度得分为 SELECT .88 (88%)。这两个指标的计算速度非常快——它们基本上是免费的。 Bernie Mertics 通常为 NULL；当这种情况发生时，你什么也没有失去，也什么也没有得到。当伯尼isn'tNULL，但是，您刚刚完成了一些特殊的事情... *您已经解决了 Levenshtein (LD)、Damerau-Levenshtein (DLD)、最长公共子串和子序列 (LCSS) 等等。 * 当 Bernie (B) 不为 NULL 时，则 LD=B、DLD=B 且 LCSS=L1。如果您可以将伯尼应用于您的相似性函数，我不会感到惊讶;^) 这被称为减少 https://en.wikipedia.org/wiki/Reduction_(complexity):

本文末尾包含 bernie8K (VARCHAR(8000))。除了伯尼距离和相似度之外，您还可以使用伯尼来计算最大相似度（多发性硬化症）。例如：MS = L1/L2。 MS(“ABCD”，“ABCXYZ”) 为 67%。换句话说，当L1=4且L2=6时，两个字符串不能超过67%（4/6=0.6666）。有了这些信息，您就可以创建一个最小相似度参数可以让您大大减少比较次数。现在是演示。

Problem:

我曾经有一个拥有 1000 名员工的大客户。他们继承了 DB 中数百个手动输入的职位重复职位，例如“贷款官员”和“贷款官员”。报告称他们有 2005 名贷款官员和 16 名贷款官员。事实上，他们有 2021 名贷款官员（16 名职位名称拼写错误）。任务是识别（并消除重复）这些职位名称。这个例子是问题的缩小版。请注意我的评论。

-- Sample data.
DECLARE @jobs TABLE
(
  JobId  INT IDENTITY PRIMARY KEY,
  JobCat VARCHAR(100) NOT NULL
);

INSERT @jobs(JobCat) 
VALUES('Editing Department'),('Director'),('Producer'),('Actor'),
      ('Film Editing Department'),('Producers'),('Directer');

-- without any pre-filtering I would need to compare 21 pairs of strings "strings pairs"....
SELECT      j1.JobCat, j2.JobCat
FROM        @jobs AS j1
CROSS JOIN  @jobs AS j2
CROSS APPLY samd.bernie8k(j1.JobCat, j2.JobCat) AS  b
WHERE       j1.JobId < j2.JobId;

Returns:

    JobCat                             JobCat
---------------------------------- ---------------------------------
Editing Department                 Director
Editing Department                 Producer
...
Director                           Directer
Producer                           Actor
...

现在我们将利用伯尼距离来获得答案并排除不必要的比较。 B 不为 NULL 的字符串对已得到解决，MS 我们刚刚将工作量从 21 次比较减少到 5 次，并很快识别出 2 个重复项。

DECLARE @MinSim DEC(6,4) = .8;

SELECT      j1.JobId, j2.JobId, b.S1, b.S2, b.L1, b.L2, b.B, b.MS, b.S
FROM        @jobs AS j1
CROSS JOIN  @jobs AS j2
CROSS APPLY samd.bernie8k(j1.JobCat, j2.JobCat) AS  b
WHERE       j1.JobId < j2.JobId
AND         (b.MS >= @MinSim OR b.B IS NOT NULL);

Returns:

    JobId       JobId       S1                         S2                    L1   L2  B     MS       S       
----------- ----------- --------------------- -------------------------- ---- --- ----- -------- -------
1           5           Editing Department    Film Editing Department    18   23  5     0.7826   0.7826
2           3           Director              Producer                   8    8   NULL  1.0000   NULL
2           6           Director              Producers                  8    9   NULL  0.8889   NULL
2           7           Director              Directer                   8    8   NULL  1.0000   NULL
3           6           Producer              Producers                  8    9   1     0.8889   0.8889
3           7           Producer              Directer                   8    8   NULL  1.0000   NULL
6           7           Directer              Producers                  8    9   NULL  0.8889   NULL

这个减少的东西很酷！让我们为聚会带来更多算法。首先，我们将获取 ngrams8k 的副本并创建一个函数来计算相似度的汉明距离。汉明 (HD) 可以在 O(n) 时间内计算；相似度为 (L1-HD)/L2。请注意，当 HD=1 时，则 LD=1、DLD=1、LCSS=L1-1，我们也可能计算了您的相似度。

-- Sample data.
DECLARE @jobs TABLE
(
  JobId  INT IDENTITY PRIMARY KEY,
  JobCat VARCHAR(100) NOT NULL
);

INSERT @jobs(JobCat) 
VALUES('Editing Department'),('Director'),('Producer'),('Actor'),
      ('Film Editing Department'),('Producers'),('Directer');

DECLARE @MinSim DECIMAL(6,4) = .8;

WITH br AS
(
  SELECT      b.*
  FROM        @jobs AS j1
  CROSS JOIN  @jobs AS j2
  CROSS APPLY samd.bernie8k(j1.JobCat, j2.JobCat) AS  b
  WHERE       j1.JobId < j2.JobId
  AND         (b.MS >= @MinSim OR b.B IS NOT NULL)
) 
SELECT      br.S1, br.S2, br.L1, br.L2, br.D, S = h.MinSim
FROM        br
CROSS APPLY samd.HammingDistance8k(br.S1, br.S2) AS h
WHERE       br.B IS NULL
AND         h.MinSim >= @MinSim
UNION ALL
SELECT      br.S1, br.S2, br.L1, br.L2, br.D, br.S
FROM        br
WHERE       br.B IS NOT NULL;

Returns:

S1                     S2                        L1          L2          D           S
---------------------- ------------------------- ----------- ----------- ----------- --------------
Director               Directer                  8           8           0           0.87500000000
Editing Department     Film Editing Department   18          23          5           0.78260000000
Producer               Producers                 8           9           1           0.88890000000

Summary:

我们从 21 个字符串对开始进行比较。使用 Bernie，我们将这个数字减少到 5（解决了 2 个，排除了 14 个）使用 Hamming，我们挑选了另一个。只剩下四个了！

功能：

CREATE FUNCTION samd.bernie8K
(
  @s1 VARCHAR(8000), 
  @s2 VARCHAR(8000)
)
/*****************************************************************************************
[Purpose]:
 This function allows developers to optimize and simplify how they fuzzy comparisons 
 between two strings (@s1 and @s2). 

 bernie8K returns:
  S1  = short string - LEN(S1) will always be <= LEN(S2); The formula to calculate S1 is:
          S1 = CASE WHEN LEN(@s1) > LEN(@s2) THEN @s2, ELSE @s1 END;
  S2  = long string  - LEN(S1) will always be <= LEN(S2); The formula to calculate S1 is:
          S2 = CASE WHEN LEN(@s1) > LEN(@s2) THEN @s1, ELSE @s2;
  L1  = short string length = LEN(S1)
  L2  = long string length  = LEN(S2)
  D   = distance            = L2-L1; how many characters needed to make L1=L2; D tells us:
          1. D    is the *minimum* Levenshtein distance between S1 and S2
          2. L2/D is the *maximum* similarity between S1 and S2
  I   = index               = CHARINDEX(S1,S2);
  B   = bernie distance     = When B is not NULL then:
          1. B = The Levenshtein Distance between S1 and S2
          2. B = The Damarau-Levenshtein Distance bewteen S1 and S2
          3. B = The Longest Common Substring & Longest Common Subsequence of S1 and S2
          4. KEY! = The similarity between L1 and L2 is L1/l2
  MS  = Max Similarity      = Maximum similarity
  S   = Minimum Similarity  = When B isn't null S is the same Similarity value returned by
        mdq.Similarity: https://msdn.microsoft.com/en-us/library/ee633878(v=sql.105).aspx

[Author]:
  Alan Burstein

[Compatibility]: 
 SQL Server 2005+, Azure SQL Database, Azure SQL Data Warehouse & Parallel Data Warehouse

[Parameters]:
 @s1 = varchar(8000); First of two input strings to be compared
 @s2 = varchar(8000); Second of two input strings to be compared

[Returns]:
 S1 = VARCHAR(8000); The shorter of @s1 and @s2; returns @s1 when LEN(@s1)=LEN(@s2)
 S2 = VARCHAR(8000); The longer  of @s1 and @s2; returns @s2 when LEN(@s1)=LEN(@s2)
 L1 = INT; The length of the shorter of @s1 and @s2 (or both when they're of equal length)
 L2 = INT; The length of the longer  of @s1 and @s2 (or both when they're of equal length)
 D  = INT; L2-L1; The "distance" between L1 and L2
 I  = INT; The location (position) of S1 inside S2; Note that when 1>0 then S1 is both:
       1.  a substring   of S2
       2.  a subsequence of S2
 B  = INT; The Bernie Distance between @s1 and @s1; When B is not null then:
       1. B = The Levenshtein Distance between S1 and S2
       2. B = The Damarau-Levenshtein Distance bewteen S1 and S2
       3. B = The Longest Common Substring & Longest Common Subsequence of S1 and S2
       4. KEY! = The similarity between L1 and L2 is L1/l2
 MS = DECIMAL(6,4); Returns the same simlarity score as mdq.Similarity would if S1 where a
      substring of S2
 S  = DECIMAL(6,4); When B isn't null then S is the same Similarity value returned by
      mdq.Similarity

 For more about mdq.Similarity visit:
    https://msdn.microsoft.com/en-us/library/ee633878(v=sql.105).aspx

[Syntax]:
--===== Autonomous
 SELECT b.TX, b.S1, b.S2, b.L1, b.L2, b.D, b.I, b.B, b.MS, b.S
 FROM   samd.bernie8K('abc123','abc12') AS b;

--===== CROSS APPLY example
 SELECT b.TX, b.S1, b.S2, b.L1, b.L2, b.D, b.I, b.B, b.MS, b.S
 FROM        dbo.SomeTable            AS t
 CROSS APPLY samd.bernie8K(t.S1,t.S2) AS b;

[Dependencies]:
 N/A

[Developer Notes]:
 X. Bernie ignores leading and trailing spaces trailing, and returns trimmed strings!
 1. When @s1 is NULL then S2 = @s2, L2 = LEN(@s2); 
    When @s2 is NULL then S1 = @s1, L1 = LEN(@s1)
 2. bernie8K ignores leading and trailing whitespace on both input strings (@s1 and @s2). 
    In other words LEN(@s1)=DATALENGTH(@s1), LEN(@s2)=DATALENGTH(@s2)
 3. bernie8K is deterministic; for more about deterministic and nondeterministic
    functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx

[Examples]:
--==== 1. BASIC USE:
  -- 1.1. When b.I > 0  
  SELECT b.TX, b.S1, b.S2, b.L1, b.L2, b.D, b.I, b.B, b.MS, b.S
  FROM samd.bernie8K('abc1234','bc123') AS b;

  -- 1.2. When b.I = 0
  SELECT b.TX, b.S1, b.S2, b.L1, b.L2, b.D, b.I, b.B, b.MS, b.S
  FROM samd.bernie8K('abc123','xxx') AS b;
-----------------------------------------------------------------------------------------
[Revision History]:
 Rev 00 - 20180708 - Inital Creation - Alan Burstein
 Rev 01 - 20181231 - Added Boolean logic for transpositions (TX column) - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT 
  TX = base.TX,     -- transposed? boolean - were S1 and S2 transposed?
  S1 = base.S1,     -- short string >> IIF(LEN(@s1)>LEN(@s2),@s2,@s1)
  S2 = base.S2,     -- long  string >> IIF(LEN(@s1)>LEN(@s2),@s1,@s2)
  L1 = base.L1,     -- short string length >> IIF(LEN(@s1)>LEN(@s2),LEN(@s2),LEN(@s1))
  L2 = base.L2,     -- long  string length >> IIF(LEN(@s1)>LEN(@s2),LEN(@s1),LEN(@s2))
  D  = base.D,        -- bernie string distance >> # of characters needed to make L1=L2
  I  = iMatch.idx,  -- bernie index >> position of S1 within S2
  B  = bernie.D,      -- bernie distance >> IIF(CHARINDEX(S1,S2)>0,L2-L1,NULL)
  MS = maxSim.D,    -- maximum similarity
  S  = similarity.D -- (minimum) similarity
FROM
(
  SELECT
    TX = CASE WHEN ls.L=1 THEN 1 ELSE 0 END,
    S1 = CASE WHEN ls.L=1 THEN s.S2 ELSE s.S1 END,
    S2 = CASE WHEN ls.L=1 THEN s.S1 ELSE s.S2 END,
    L1 = CASE WHEN ls.L=1 THEN l.S2 ELSE l.S1 END,
    L2 = CASE WHEN ls.L=1 THEN l.S1 ELSE l.S2 END,
    D  = ABS(l.S1-l.S2)

  FROM        (VALUES(LEN(LTRIM(@s1)),LEN(LTRIM(@s2))))     AS l(S1,S2) -- LEN(S1,S2)
  CROSS APPLY (VALUES(RTRIM(LTRIM(@S1)),RTRIM(LTRIM(@S2)))) AS s(S1,S2) -- S1 and S2 trimmed
    CROSS APPLY (VALUES(SIGN(l.S1-l.S2)))                     AS ls(L)    -- LeftLength
) AS base
CROSS APPLY (VALUES(ABS(SIGN(base.L1)-1),ABS(SIGN(base.L2)-1)))             AS blank(S1,S2)
CROSS APPLY (VALUES(CHARINDEX(base.S1,base.S2)))                            AS iMatch(idx)
CROSS APPLY (VALUES(CASE WHEN SIGN(iMatch.idx|blank.S1)=1 THEN base.D END)) AS bernie(D)
CROSS APPLY (VALUES(CAST(CASE blank.S1 WHEN 1 THEN 1.*blank.S2 
                      ELSE 1.*base.L1/base.L2 END AS DECIMAL(6,4))))        AS maxSim(D)
CROSS APPLY (VALUES(CAST(1.*NULLIF(SIGN(iMatch.idx),0)*maxSim.D 
                      AS DECIMAL(6,4))))                                    AS similarity(D);
GO

CREATE FUNCTION dbo.rangeAB
(
  @low  BIGINT, -- (start) Lowest  number in the set
  @high BIGINT, -- (stop)  Highest number in the set
  @gap  BIGINT, -- (step)  Difference between each number in the set
  @row1 BIT     -- Base: 0 or 1; should RN begin with 0 or 1?
)
/****************************************************************************************
[Purpose]:
 Creates a lazy, in-memory...

[Author]: Alan Burstein

[Compatibility]: 
 SQL Server 2008+ and Azure SQL Database 

[Syntax]:
 SELECT r.RN, r.OP, r.N1, r.N2
 FROM   dbo.rangeAB(@low,@high,@gap,@row1) AS r;

[Parameters]:
 @low  = BIGINT; represents the lowest  value for N1.
 @high = BIGINT; represents the highest value for N1.
 @gap  = BIGINT; represents how much N1 and N2 will increase each row. @gap also
         represents the difference between N1 and N2.
 @row1 = BIT; represents the base (first) value of RN. When @row1 = 0, RN begins with 0,
         when @row = 1 then RN begins with 1.

[Returns]:
 Inline Table Valued Function returns:
 RN = BIGINT; a row number that works just like T-SQL ROW_NUMBER() except that it can 
      start at 0 or 1 which is dictated by @row1. If you are returning the numbers:
      (0 or 1) Through @high, then use RN as your "N" value, otherwise use N1.
 OP = BIGINT; returns the "opposite number" that relates to RN. When RN begins with 0 the
      first number in the set will be 0 for RN, the last number in will be 0 for OP. When
      RN is 1 to 10, the numbers 1 to 10 are retrurned in ascending order for RN and in
      descending order for OP. 

      Given the Numbers 1 to 3, 3 is the opposite of 1, 2 the opposite of 2, and 1 is the
      opposite of 3. Given the numbers -1 to 2, the opposite of -1 is 2, the opposite of 0
      is 1, and the opposite of 1 is 0.  
 N1 = BIGINT; This is the "N" in your tally table/numbers function. this is your *Lazy* 
      sequence of numbers starting at @low and incrimenting by @gap until the next number
      in the sequence is greater than @high.
 N2 = BIGINT; a lazy sequence of numbers starting @low+@gap and incrimenting by @gap. N2
      will always be greater than N1 by @gap. N2 can also be thought of as:
      LEAD(N1,1,N1+@gap) OVER (ORDER BY RN)

[Dependencies]:
N/A

[Developer Notes]:
 1. The lowest and highest possible numbers returned are whatever is allowable by a 
    bigint. The function, however, returns no more than 531,441,000,000 rows (8100^3). 
 2. @gap does not affect RN, RN will begin at @row1 and increase by 1 until the last row
    unless its used in a subquery where a filter is applied to RN.
 3. @gap must be greater than 0 or the function will not return any rows.
 4. Keep in mind that when @row1 is 0 then the highest RN value (ROWNUMBER) will be the 
    number of rows returned minus 1
 5. If you only need is a sequential set beginning at 0 or 1 then, for best performance
    use the RN column. Use N1 and/or N2 when you need to begin your sequence at any 
    number other than 0 or 1 or if you need a gap between your sequence of numbers. 
 6. Although @gap is a bigint it must be a positive integer or the function will
    not return any rows.
 7. The function will not return any rows when one of the following conditions are true:
      * any of the input parameters are NULL
      * @high is less than @low 
      * @gap is not greater than 0
    To force the function to return all NULLs instead of not returning anything you can
    add the following code to the end of the query:

      UNION ALL 
      SELECT NULL, NULL, NULL, NULL
      WHERE NOT (@high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0)

    This code was excluded as it adds a ~5% performance penalty.
 8. There is no performance penalty for sorting by rn ASC; there is a large performance 
    penalty for sorting in descending order WHEN @row1 = 1; WHEN @row1 = 0
    If you need a descending sort the use OP in place of RN then sort by rn ASC. 
 9. For 2012+ systems, The TOP logic can be replaced with:
   OFFSET 0 ROWS FETCH NEXT 
     ABS((ISNULL(@high,0)-ISNULL(@low,0))/ISNULL(@gap,0)+ISNULL(@row1,1)) ROWS ONLY

Best Practices:
--===== 1. Using RN (rownumber)
 -- (1.1) The best way to get the numbers 1,2,3...@high (e.g. 1 to 5):
 SELECT r.RN
 FROM   dbo.rangeAB(1,5,1,1) AS r;

 -- (1.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 0 to 5):
 SELECT r.RN
 FROM   dbo.rangeAB(0,5,1,0) AS r;

--===== 2. Using OP for descending sorts without a performance penalty
 -- (2.1) The best way to get the numbers 5,4,3...@high (e.g. 5 to 1):
 SELECT   r.OP
 FROM     dbo.rangeAB(1,5,1,1) AS r 
 ORDER BY R.RN ASC;

 -- (2.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 5 to 0):
 SELECT   r.OP 
 FROM     dbo.rangeAB(1,6,1,0) AS r
 ORDER BY r.RN ASC;

--===== 3. Using N1
 -- (3.1) To begin with numbers other than 0 or 1 use N1 (e.g. -3 to 3):
 SELECT r.N1
 FROM   dbo.rangeAB(-3,3,1,1) AS r;

 -- (3.2) ROW_NUMBER() is built in. If you want a ROW_NUMBER() include RN:
 SELECT r.RN, r.N1
 FROM   dbo.rangeAB(-3,3,1,1) AS r;

 -- (3.3) If you wanted a ROW_NUMBER() that started at 0 you would do this:
 SELECT r.RN, r.N1
 FROM dbo.rangeAB(-3,3,1,0) AS r;

--===== 4. Using N2 and @gap
 -- (4.1) To get 0,10,20,30...100, set @low to 0, @high to 100 and @gap to 10:
 SELECT r.N1
 FROM   dbo.rangeAB(0,100,10,1) AS r;
 -- (4.2) Note that N2=N1+@gap; this allows you to create a sequence of ranges.
 --       For example, to get (0,10),(10,20),(20,30).... (90,100):

 SELECT r.N1, r.N2
 FROM  dbo.rangeAB(0,90,10,1) AS r;

-----------------------------------------------------------------------------------------
[Revision History]:
 Rev 00 - 20140518 - Initial Development - AJB
 Rev 01 - 20151029 - Added 65 rows. Now L1=465; 465^3=100.5M. Updated comments - AJB
 Rev 02 - 20180613 - Complete re-design including opposite number column (op)
 Rev 03 - 20180920 - Added additional CROSS JOIN to L2 for 530B rows max - AJB
 Rev 04 - 20190306 - Added inline aliasing function(f): 
                     f.R=(@high-@low)/@gap, f.N=@gap+@low - AJB
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH
L1(N)  AS 
(
  SELECT 1
  FROM (VALUES
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0)) T(N) -- 90 values
),
L2(N)  AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),
iTally AS (SELECT RN = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)
SELECT r.RN, r.OP, r.N1, r.N2
FROM
(
  SELECT
    RN = 0,
    OP = (@high-@low)/@gap,
    N1 = @low,
    N2 = @gap+@low
  WHERE @row1 = 0
  UNION ALL
  SELECT TOP (ABS((ISNULL(@high,0)-ISNULL(@low,0))/ISNULL(@gap,0)+ISNULL(@row1,1)))
    RN = i.RN,
    OP = (@high-@low)/@gap+(2*@row1)-i.RN,
    N1 = (i.rn-@row1)*@gap+@low,
    N2 = (i.rn-(@row1-1))*@gap+@low
  FROM       iTally AS i
  ORDER BY   i.RN
) AS r
WHERE @high&@low&@gap&@row1 IS NOT NULL AND @high >= @low 
AND   @gap > 0;
GO

CREATE FUNCTION samd.NGrams8k
(
  @string VARCHAR(8000), -- Input string
  @N      INT            -- requested token size
)
/*****************************************************************************************
[Purpose]:
 A character-level N-Grams function that outputs a contiguous stream of @N-sized tokens
 based on an input string (@string). Accepts strings up to 8000 varchar characters long.

[Author]: 
 Alan Burstein

[Compatibility]:
 SQL Server 2008+, Azure SQL Database

[Syntax]:
--===== Autonomous
 SELECT ng.position, ng.token 
 FROM   samd.NGrams8k(@string,@N) AS ng;

--===== Against a table using APPLY
 SELECT      s.SomeID, ng.position, ng.token
 FROM        dbo.SomeTable AS s
 CROSS APPLY samd.NGrams8K(s.SomeValue,@N) AS ng;

[Parameters]:
 @string  = The input string to split into tokens.
 @N       = The size of each token returned.

[Returns]:
 Position = BIGINT; the position of the token in the input string
 token    = VARCHAR(8000); a @N-sized character-level N-Gram token

[Dependencies]:
 1. dbo.rangeAB (iTVF)

[Revision History]:
------------------------------------------------------------------------------------------
 Rev 00 - 20140310 - Initial Development - Alan Burstein
 Rev 01 - 20150522 - Removed DQS N-Grams functionality, improved iTally logic. Also Added
                     conversion to bigint in the TOP logic to remove implicit conversion
                     to bigint - Alan Burstein
 Rev 03 - 20150909 - Added logic to only return values if @N is greater than 0 and less
                     than the length of @string. Updated comment section. - Alan Burstein
 Rev 04 - 20151029 - Added ISNULL logic to the TOP clause for the @string and @N
                     parameters to prevent a NULL string or NULL @N from causing "an
                     improper value" being passed to the TOP clause. - Alan Burstein
 Rev 05 - 20171228 - Small simplification; changed: 
                (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))-(ISNULL(@N,1)-1)),0)))
                                           to:
                (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))+1-ISNULL(@N,1)),0)))
 Rev 06 - 20180612 - Using CHECKSUM(N) in the to convert N in the token output instead of
                     using (CAST N as int). CHECKSUM removes the need to convert to int.
 Rev 07 - 20180612 - re-designed to: Use dbo.rangeAB - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
  position   = r.RN,
  token      = SUBSTRING(@string, CHECKSUM(r.RN), @N)
FROM  dbo.rangeAB(1, LEN(@string)+1-@N,1,1) AS r
WHERE @N > 0 AND @N <= LEN(@string);
GO

CREATE FUNCTION samd.hammingDistance8K 
(
  @s1 VARCHAR(8000), -- first input string
  @s2 VARCHAR(8000)  -- second input string
)
/*****************************************************************************************
[Purpose]:
 Purely set-based iTVF that returns the Hamming Distance between two strings of equal 
 length. See: https://en.wikipedia.org/wiki/Hamming_distance

[Author]:
 Alan Burstein

[Compatibility]:
 SQL Server 2008+

[Syntax]:
--===== Autonomous
 SELECT h.HD
 FROM   samd.hammingDistance8K(@s1,@s2) AS h;

--===== Against a table using APPLY
 SELECT t.string, S2 = @s2, h.HD
 FROM   dbo.someTable AS t
 CROSS 
 APPLY  samd.hammingDistance8K(t.string, @s2) AS h;

[Parameters]:
  @s1 = VARCHAR(8000); the first input string
  @s2 = VARCHAR(8000); the second input string

[Dependencies]:
 1. samd.NGrams8K

[Examples]:
--===== 1. Basic Use
DECLARE @s1 VARCHAR(8000) = 'abc1234',
        @s2 VARCHAR(8000) = 'abc2234';

SELECT h.HD, h.L, h.minSim
FROM   samd.hammingDistance8K(@s1,@s2) AS h;

---------------------------------------------------------------------------------------
[Revision History]: 
 Rev 00 - 20180800 - Initial re-design - Alan Burstein
 Rev 01 - 20181116 - Added L (Length) and minSim
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT H.HD, H.L, minSim = 1.*(H.L-H.HD)/H.L
FROM
( 
  SELECT LEN(@s1)-SUM(CHARINDEX(ng.token,SUBSTRING(@S2,ng.position,1))), 
         CASE LEN(@s1) WHEN LEN(@s2) THEN LEN(@s1) END
  FROM   samd.NGrams8k(@s1,1) AS ng
  WHERE  LEN(@S1)=LEN(@S2)
) AS H(HD,L);
GO

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

CROSS APPLY WHERE 子句在交叉应用之前或结果之后起作用吗的相关文章

Postgres聚合函数，用于计算风速（矢量幅度）和风向（矢量方向）的矢量平均值

我有一个有两列的表格wind speed and wind direction 我想要一个返回平均值的自定义聚合函数wind speed and wind direction wind speed and wind direction组合起
oracle sql中的group by表达式的内连接[重复]

这个问题在这里已经有答案了我是 sql 新手感谢任何帮助我有两张桌子 employees and jobs employees包含一个变量job id 多个员工可以具有相同的 job ID jobs包含变量job id and job
在 redshift 上查询数据时出错 - 获取条带数据时出错

我正在尝试在 redshift 中的外部表上运行以下查询 select from schema table limit 10 我收到一个错误 2018 06 20 12 03 14 XX000 500310 Amazon 无效操作 S3 查
Postgres 中的动态 UNION ALL 查询

我们使用 Postgres PostGis 连接来获取通过地理服务器发布的数据查询现在看起来像这样 SELECT row number over ORDER BY a ogc fid AS qid a wkb geometry AS ge
实体框架和 SQL Server 同义词

仓促的研究告诉我即使 EF4 仍然不支持 SQL Server 同义词设置基于同义词的实体集就像基于表的实体集一样有哪些选项我想出的最简单的方法是复制同义词的基础表将其添加到我的模型中然后重命名商店模型中的实体集使用 POC
SQL Loader 脚本帮助添加 SYSDATE、USER

我正在尝试从文件加载数据并且想将 CREATED DATE 和 UPDATED DATE 设置为 SYSDATE 将 CREATE BY 和 UPDATED BY 设置为 USER 这是我正在使用的表 CREATE TABLE CATAL
为什么 SSMS 会更改我的存储过程（重新格式化、将 exec 更改为 EXECUTE 等）

SSMS 突然重新格式化我的存储过程它以前从未这样做过这是我正在谈论的一个例子这是我上周创建的存储过程的开头 CREATE PROCEDURE dbo usp LoanDataNames LoanID varchar max null
使用 SqlBulkCopy 插入 GUID

我正在尝试使用以下命令进行批量插入SqlBulkCopy来自由 SQL Server 管理导入导出向导创建的平面文件的类这些文件以逗号分隔文件中的一行可能如下所示 DCAD82A9 32EC 4351 BEDC 2F8291B40AB3
PHP SQLSRV：sqlsrv_query() 是否可以正确地准备 select 语句？

TL DR Does sqlsrv query 做同样的工作select陈述比sqlsrv prepare and sqlsrv execute 关于准备好的陈述做什么我怎样才能做一个安全的select陈述一点历史我是 PHP 开发
复合主键：好还是坏？

虽然可以使用复合主键但是对于下面的情况这真的是一种不好的做法吗 Stackoverflow 上的共识在这个问题上似乎是双向的 Why 我想将订单付款存储在单独的表中原因是一个订单可以有许多项目这些项目以多对多关系的形式在单独的表中
使用 FOR XML 从 SQL Server 2008 R2 返回空或 null 字段作为

我正在使用 SQL Server 2008 R2 运行查询FOR XML PATH 我唯一的问题是我希望所有元素都出现即使它们是 NULL 并且我希望空或 null 元素返回为
将datagridview的所有数据插入数据库vb.net

Dim Con As OleDbConnection New OleDbConnection Provider Microsoft Jet OLEDB 4 0 Data Source Music Sales Database mdb Dim
使用 SUM() 而不使用 ISNULL() 是否安全

我正在努力提高 SP 的性能我对 SUM 和 ISNULL 有疑问当我对一列求和时我应该使用 ISNULL 吗使用不带 ISNULL 的 SUM 安全吗我的例子如下 SUM ISNULL COL1 0 由于 ISNULL 成本很高
SQL Server 更改数据捕获 - 捕获进行更改的用户

关于SQL Server 更改数据捕获 https msdn microsoft com en us library bb933994 v sql 120 aspx 你能追踪到User谁对行列数据进行了更改或者是否有办法扩展 CDC 以
mysql 从每个组中选择 2 行

我有 2 个具有这种结构的表 Products id title 1 sample 1 2 sample 2 3 sample 3 4 sample 4 5 sample 5 6 sample 6 gallery id typeid nam
使用两列的 T-SQL“不在其中”

我想从表 T1 中选择所有记录其中 A 列和 B 列中的值与表 T2 中的 C 列和 D 列没有匹配的元组 In mysql Where not in 使用两列 https stackoverflow com questions 8435
Django QuerySet 中计算列的总和

鉴于以下情况Contribution model class Contribution models Model start time models DateTimeField end time models DateTimeField n
使用 dbt 中的星形宏获取列名称和类型

使用星形宏除了列名之外有没有办法还获取列数据类型布尔值数值等例如此查询使用星号宏从引用表中收集列名并将其保存为数组变量column names 然后循环该数组并将 max 函数应用于所有列 set column names s
如何加快 PostgreSQL 表中的行计数？

我们需要计算 PostgreSQL 表中的行数在我们的例子中不需要满足任何条件如果可以显着提高查询速度那么获得行估计是完全可以接受的基本上我们想要select count id from table 尽可能快地运行即使这意味着
获取每个人每天的最短日期时间的记录

CREATE TABLE IF NOT EXISTS accesscards id int 11 NOT NULL AUTO INCREMENT department varchar 255 NOT NULL name varchar 25

随机推荐

Liftweb 环境中的后台任务

我必须编写守护进程并且我想使用模型来连接到数据库和一些有用的 Lift 类是否可以运行 Rails 的 rake 任务的模拟 Scala 社区组上也有类似的问题答案是使用Actors来做后台处理
转储 Windows DLL 版本的命令行工具？

我需要一个命令行工具来转储标准 Windows DLL 版本信息以便我可以通过 bash 脚本 Cygwin 对其进行处理作为一名 Java 开发人员我不太习惯 Microsoft 开发工具尽管我对 Microsoft Visual
使用 pytz 获取时区的国家/地区代码？

我在用着pytz http pytz sourceforge net country information 我已经阅读了整个文档表但没有看到如何做到这一点我有一个时区美国芝加哥我想要的只是获取该时区的相应国家地区代码美国它
rspec 在需要存根的私有方法中测试私有方法

Simplecov 检测到我遗漏了一些测试lib api verson rb class class ApiVersion def initialize version version version end def matches req
整个程序可以是不可变的吗？ [关闭]

Closed 这个问题需要多问focused help closed questions 目前不接受答案我熟悉不可变性并且可以设计不可变类但我主要拥有学术知识缺乏实践经验请参考上面的链接图片尚不允许嵌入从下往上看学生需要新地址
Linux find 命令权限被拒绝

我想过滤掉不必要的信息权限被拒绝这些是命令的输出find type f name sources list find run lxcfs Permission denied find run sudo Permission denie
R 脚本 - 如何在错误时继续执行代码

我编写了一个 R 脚本其中包含一个检索外部 Web 数据的循环数据的格式大多数时候是相同的但有时格式会以不可预测的方式发生变化并且我的循环崩溃停止运行有没有办法不管错误如何继续执行代码我正在寻找类似于 VBA 中的 On er
Apache Zeppelin 安装 grunt 构建错误

我的配置如下 Ubuntu 15 04 Java 1 7 Spark 1 4 1 Hadoop 2 7 Maven 3 3 3 我正在尝试从 github 成功克隆 Apache Zeppelin 并使用以下命令后安装它 mvn clean
ASP.NET 中的 ThreadStaticAttribute

我有一个需要存储的组件static每个线程的值它是一个通用组件可以在许多场景中使用而不仅仅是在 ASP NET 中我想用 ThreadStatic 属性来实现我的目标假设它在 ASP NET 场景中也能正常工作因为我假设每个请求
构建 AOSP 5.1 时出现 API 更改错误

目前正在尝试构建 android 5 1 0 r5 我已经检查了来源并且没有做任何修改但是编译时出现以下错误 Checking API checkpublicapi current out target common obj PACKA
sed 将带空格的行插入到特定行

我在开头有一行空格例如 Hello world 我想将此行插入到文件中的特定行例如将 hello world 插入下一个文件 hello world result hello hello world world 我正在使用这个 sed
在 C++ 中，将 float 转换为 double 再转换回 float 是否给出相同的值

假设在下面的代码中 float f1 double d1 static cast
如何在 Ubuntu x64 中使用 ptrace 插入 int3？

我正在努力追随本指南 http eli thegreenplace net 2011 01 27 how debuggers work part 2 breakpoints 通过设置断点达到相同的结果唯一的区别是我在 x64 系统上所以
致命：Jenkins IIS ID 无效

我正在尝试设置 Jenkins 从 bitbucket 中提取并构建一个项目我在 IIS 8 5 Server 2012 r2 上使用它我已经设置了 Git 和 Bitbucket 插件我已经建立了一个包含以下内容的项目 Branch
404 路由无法匹配请求的 URL

我刚刚开始学习zend 框架 questions tagged zend framework并遵循此用户指南 http framework zend com manual 2 3 en index html 我能够成功安装zend skel
create() 时不会调用观察者

我有一个Ember Mixin它观察到它的属性之一这里bar baz 我扩展了这个 Mixin 并设置了bar baz in the create 参数但我的观察者没有被调用这是我的代码 App FooMixin Ember Mixi
Google Maps JavaScript API v3 方向功能

我使用 Google Maps js API v3 我可以根据路径点显示方向this http code google com intl hu apis maps documentation directions Waypoints 我想要
MSBuild 找不到 resgen.exe

我有一台 VM 机器我在其中复制了 SDK 文件和路径转到注册表并将密钥添加到注册表中但我不断收到错误 resgen exe找不到 C Windows Microsoft NET Framework v4 0 30319 Micros
JSF 1.2：如何在同一视图上的回发中保持请求范围的托管 bean 处于活动状态？

是否可以在同一页面上的回发过程中保持请求作用域的 bean 处于活动状态一般的问题是 bean 在请求结束时被丢弃并在每次表单提交时重新创建例如动态操作背后的布尔值disabled readonly and rendered重置为默认
CROSS APPLY WHERE 子句在交叉应用之前或结果之后起作用吗

我正在做一种我们在一个内部应用程序中需要的自定义模糊匹配算法我正在努力加快速度当我对模糊函数进行交叉应用以查找建议的匹配项时我不想搜索不必要的数据这是函数 select top 5 Manufacturer Manufacturer

CROSS APPLY WHERE 子句在交叉应用之前或结果之后起作用吗

CROSS APPLY WHERE 子句在交叉应用之前或结果之后起作用吗 的相关文章

随机推荐

热门标签

CROSS APPLY WHERE 子句在交叉应用之前或结果之后起作用吗的相关文章