Given two lists of genomic regions, a data.table containing the overlapped regions are returned.

region_overlap(
  x,
  y,
  regionCols1 = 1:3L,
  regionCols2 = 1:3L,
  type = c("any", "within", "start", "end", "equal"),
  mult = c("all", "first", "last"),
  matchedOnly = F
)

Arguments

x

The first input genomic regions. A data.frame() or data.table().

y

The second input genomic regions. A data.frame() or data.table().

regionCols1

A vector of column names or numbers in x giving 'chr', 'start', and 'end' columns in order. Default is the first 3 columns.

regionCols2

A vector of column names or numbers in y giving 'chr', 'start', and 'end' columns in order. Default is the first 3 columns.

type

Default value is any. Allowed values are any, within, start, end and equal.

The types shown here are identical in functionality to the function findOverlaps in the bioconductor package IRanges. Let [a,b] and [c,d] be intervals in x and y with a<=b and c<=d. For type="start", the intervals overlap iff a == c. For type="end", the intervals overlap iff b == d. For type="within", the intervals overlap iff a>=c and b<=d. For type="equal", the intervals overlap iff a==c and b==d. For type="any", as long as c<=b and d>=a, they overlap. In addition to these requirements, they also have to satisfy the minoverlap argument as explained above.

NB: maxgap argument, when > 0, is to be interpreted according to the type of the overlap. This will be updated once maxgap is implemented.

mult

When multiple rows in y match to the row in x, mult=. controls which values are returned - "all" (default), "first" or "last".

matchedOnly

A logical. If TRUE, then non-overlaped input regions won't be output. If FALSE (default), all the rows are output

Value

A data.table. The columns from x and y will have suffix '.x' and '.y', respectively.

Details

The input genomic regions should contain 3 columns, 'chr', 'start', and 'end', and all other columns are copied to the output.