Regular Expression to Match Multiple Lines of Text

Question

A friend asked a question recently, he wants to use regular expressions to extract which codes have been updated from a git commit record. Simply put, it is the code displayed by commit diff. You need to extract the lines with + and - in front of the code.

We copied a commit record from the RichX project and modified it slightly for demonstration.

+ import { Plugin } from "..";
- CONST SUM = NUM_A + NUM_B;
+ CONST SUM_ALL = NUM_A + NUM_B;

  export const DEFAULT_RICH_TEXT = {
- text: "Simple Rich Text Demo",
+ config: "Simple Rich Text Demo",
    setting: [],
  };

  export type ObjectKV<V = object> = {
    [key: string]: V;
  };

+ export interface IPlugins {
+   [key: string]: Plugin;
+ }

We convert the requirements, that is, to match the lines starting with + and - in the multi-line text.

Solution One

Analysis:

  1. First match characters starting with +: \+.*
  2. Then put -: (\+|\-).*
  3. Because the multi-line text is separated by a newline, the previous character of the single-line text starting with + is the last newline character \n of the previous line. Similarly, the end of this line is also a newline character. So we use regular assertions to match two newlines to the beginning and end of the target text: (?<=\n)(\+|\-).*(?=\n)
  4. Finally, there are two special cases to consider, the position of the beginning and end of the entire text. There is no previous line at the first position, so the newline character \n cannot be matched, only the beginning ^ can be matched, and there may be no newline after the end, use $ instead: (?<=^|\n)(\+ |\-).*(?=\n|$)

Code:

const content = `+ import { Plugin } from "..";
- CONST SUM = NUM_A + NUM_B;
+ CONST SUM_ALL = NUM_A + NUM_B;

  export const DEFAULT_RICH_TEXT = {
- text: "Simple Rich Text Demo",
+ config: "Simple Rich Text Demo",
    setting: [],
  };

  export type ObjectKV<V = object> = {
    [key: string]: V;
  };

+ export interface IPlugins {
+   [key: string]: Plugin;
+ }`

content.match(/(?<=^|\n)(\+|\-).*(?=\n|$)/g)

// output array
// 0: "+ import { Plugin } from \"..\";"
// 1: "- CONST SUM = NUM_A + NUM_B;"
// 2: "+ CONST SUM_ALL = NUM_A + NUM_B;"
// 3: "- text: \"Simple Rich Text Demo\","
// 4: "+ config: \"Simple Rich Text Demo\","
// 5: "+ export interface IPlugins {"
// 6: "+ [key: string]: Plugin;"
// 7: "+ }"

Solution Two

Analysis:

The above scheme is a bit troublesome to match newlines by ourselves. We can omit the step of judging the newline by ourselves, directly match the beginning and end of each line, and then use the regular expression flag m to enable multi-line matching mode: /^(\+|\-).*$/gm.

Code:

const content = `+ import { Plugin } from "..";
- CONST SUM = NUM_A + NUM_B;
+ CONST SUM_ALL = NUM_A + NUM_B;

  export const DEFAULT_RICH_TEXT = {
- text: "Simple Rich Text Demo",
+ config: "Simple Rich Text Demo",
    setting: [],
  };

  export type ObjectKV<V = object> = {
    [key: string]: V;
  };

+ export interface IPlugins {
+   [key: string]: Plugin;
+ }`

content.match(/^(\+|\-).*$/gm)

// output array
// 0: "+ import { Plugin } from \"..\";"
// 1: "- CONST SUM = NUM_A + NUM_B;"
// 2: "+ CONST SUM_ALL = NUM_A + NUM_B;"
// 3: "- text: \"Simple Rich Text Demo\","
// 4: "+ config: \"Simple Rich Text Demo\","
// 5: "+ export interface IPlugins {"
// 6: "+ [key: string]: Plugin;"
// 7: "+ }"

Conclusion

The above is a little experience of writing regular expressions discussed with my friends, mainly learning assertions and multi-line matching flags. The case here is relatively simple, and there will be more in-depth use cases to share with you in the future. Welcome to follow our updates #regex.

Reference

Comments