This has been called a “fc-multicategory” by Tom Leinster, for example here.
I think this as also been called a “Hypervirtual double category” here, but I don’t remember if this is exactly the same notion or if there are some additional assumption in the second link.
Another established structure very close to what you want is opetopic bicategories, which are equivalent to classical bicategories but formulated opetopically; see e.g. §3 of Cheng 2003, Opetopic bicategories: comparison with the classical theory, and the subsection Non-algebraic notions of bicategory at the end of §3.4 of Leinster 2003, Higher operads, higher categories.
Concretely, in an opetopic bicategory $B$, you have a graph of 0-cells and 1-cells like in a normal bicategory; the source of a 2-cell is not just a 1-cell but a composable string of 1-cells; 2-cell composition looks just like what you’d expect; and the 1-cell composition condition says that for every composable sequence of 1-cells, there’s a universal 2-cell out of them, for a certain sense of universality.
Keeping all of this except the last condition — call such a thing an opetopic bicategories minus 1-cell composition — seems to give exactly what you’re asking for. In the one-object case, the 1-cell composition condition is exactly what Leinster calls representability of a multicategory (Def 3.3.1, ibid.) — so adding this back recovers the equivalence between monoidal categories and one-object bicategories:
(monoidal category) = (multicategory with representability) = (one-object opetopic bicategory minus 1-cell composition, with 1-cell composition) = (one-object opetopic bicategory) = (one-object bicategory)
Comparing this with Simon Henry’s answer, I would expect (opetopic bicategories minus 1-cell composition) should be fairly concretely equivalent to (fc-multicategories with only identity vertical 1-cells); indeed, Leinster hints at such a connection in the subsection mentioned above, though he doesn’t spell it out precisely.