FlashFoley: Fast Interactive Sketch2Audio Generation

FlashFoley: Fast Interactive Sketch2Audio Generation

Random Generations

This section presents random audio samples generated by the five different methods: SAO-Small base (SAOS), SAOS + sketch controls, FlashFoley, FlashFoley with Block-Autoregressive Sampling (BAR), and FlashFoley trained with the sketch-aware contrastive loss (+ Sketch L_C). For each prompt, we show the vocal sketch used for conditioning for all methods with sketch controls. All methods follow the hyperparameters detailed in the appendix.