I have been using regular expressions for a while now, but I've never had to worry about performance until now.

I'm currently writing some classes that will scrape data from a message.

-The message is very large and has a tree-like structure similar to XML.
-I know the exact formatting and structure of the message. However, certain segments ("tags" if you're comparing it to XML) are optional and others can be repeated a certain number of times.

I wrote a regular expression that contains a group for each segment and subgroups for each branching segment. There's probably over 100 groups total. It will make sure that the correct data is scraped. However, performance issues have been brought to my attention.

Assuming that the regular expression will only be compiled once, will the massive number of groups and subgroups affect performance when using matcher.find()?