The way I see it, in an 'opml+time' app every base outline node always has type=time and its text attribute is in the HH:MM:SS.mmm format (iso?). It always has children. These children could bear gifts of images, textual changes, video chapters, opml links which open doors to other worlds, map coordinates, screen coordinates, whatever.
Applications based around this principle could orchestrate and choreograph these gifts for your enjoyment, in time with an mp3 file or movie or any time based media for that matter.