Model Already Knows the Best Noise: Bayesian Active Noise Selection
via Attention in Video Diffusion Model


Kwanyoung Kim*Sanghyun Kim
    Samsung Research    
    *Fisrt and Corresponding Author    

   

Press Enter to pause or resume all videos.

TL;DR

We introduce ANSE (Active Noise Selection for Generation) , a framework that improves video diffusion by selecting optmial initial noise seeds withtout additional training. At its core, BANSA (Bayesian Active Noise Selection via Attention) estimated uncertainty from attention maps as the first denoising steps, enabling efficient, model-aware noise selection. This approach boosts generation quality, semantic alignment, and temporal consistency acrros diverse video diffusion models.



Random Seed
ANSE-selected Seed (Ours)

Text Prompt: "A vibrant, water-filled balloon hangs suspended in mid-air against a dark backdrop, its surface glistening under the spotlight. Suddenly, a pin pierces the balloon, and in extreme slow motion, the rubber bursts apart, creating a mesmerizing cascade of water droplets. The liquid forms intricate, fleeting shapes, each droplet catching the light and sparkling like tiny diamonds. The balloon's remnants peel away, revealing the water's graceful dance as it disperses into the air. The entire scene unfolds with breathtaking clarity, capturing the beauty and chaos of the explosion in exquisite detail."


Random Seed
ANSE-selected Seed (Ours)

Text Prompt: " A majestic brown bear, with its thick fur glistening in the dappled sunlight, begins its ascent up a towering pine tree in a dense forest. The bear's powerful claws grip the rough bark as it climbs higher, its muscles rippling with each movement"







Conceptual comparison of noise initialization.

(a) Prior methods iteratively refine noise using frequency domain priors through full diffusion sampling, incurring significant computational cost. (b) In contrast, our approach selects optimal noise seeds by estimating attention-based uncertainty at the first denoising step , enabling efficient and model-aware noise selection without additional training.







Overview of our BANSA-based noise selection process.

Inspired by Active Learning, which selects the most informative samples during training, we introduce Active Noise Selection to identify the most informative initial noise seeds representing lower uncertainty (our proposed BANASA). BANASA builds upon Bayesian Active Learning by Disagreement (BALD), but we design this attention space specifically for generation tasks. Given a text prompt c, we compute BANSA scores for multiple noise seeds using Bernoulli-masked attention maps from selected layers at an early diffusion step . The seed with the lowest score, indicating confident and consistent attention, is selected for generation.





Comparison on CogVideoX-2B

For Comparison, we definitely use same prompts and model for video generation.



Random Seed
ANSE-selected Seed (Ours)

Text Prompt: " A sophisticated couple, dressed in elegant evening attire, walks down a dimly lit street, their formal wear glistening under the streetlights. The man, in a tailored black tuxedo, and the woman, in a flowing red gown, share a black umbrella as the rain begins to pour heavily. The camera captures their synchronized steps and the smooth, steady movement of their journey. Raindrops bounce off their umbrella, creating a rhythmic pattern. The couple's expressions shift from surprise to laughter as they embrace the unexpected downpour. Their polished shoes splash through puddles, and the streetlights cast a warm glow on the wet pavement, enhancing the romantic ambiance of their shared moment."


Random Seed
ANSE-selected Seed (Ours)

Text Prompt: " A joyful Corgi with a fluffy coat and perky ears frolics in a sunlit park, the golden hues of sunset casting a warm glow on the scene. The camera zooms in on the Corgi's expressive face, capturing its bright eyes and wide, happy grin"

Random Seed
ANSE-selected Seed (Ours)

Text Prompt: " A drone captures a breathtaking aerial view of a festive celebration in a snow-covered town square, centered around a towering, brilliantly lit Christmas tree adorned with twinkling lights and ornaments. The scene is alive with vibrant fireworks bursting in the sky, casting colorful reflections on the snow below. The starry night sky serves as a magical backdrop, enhancing the festive atmosphere. Below, people in warm winter attire gather, their faces illuminated by the glow of the tree and fireworks, creating a heartwarming sense of community and joy. The drone's perspective showcases the entire scene, from the sparkling tree to the dazzling fireworks and the serene, star-filled sky above."


Random Seed
ANSE-selected Seed (Ours)

Text Prompt: "A majestic polar bear, standing on its hind legs, strums an electric guitar with surprising dexterity, set against a backdrop of the Arctic tundra. The bear's white fur contrasts sharply with the vibrant red of the guitar, creating a striking visual. Snowflakes gently fall around, adding a magical touch to the scene. as its large paws expertly navigate the strings. In the background, the Northern Lights dance across the sky, casting an ethereal glow over the icy landscape. The scene captures a whimsical blend of nature and fantasy, where the wild meets the world of music."


Random Seed
ANSE-selected Seed (Ours)

Text Prompt: "A plush teddy bear, with soft brown fur and a red bow tie, stands on a stool in a cozy, vintage kitchen. The bear's tiny paws are submerged in a sink filled with soapy water, bubbles floating around. The kitchen is warmly lit, with checkered curtains and wooden cabinets. The bear carefully scrubs a plate, its expression one of focused determination. Nearby, a drying rack holds a few clean dishes, and a small radio plays a cheerful tune. The scene captures a whimsical moment of domesticity, with the teddy bear embodying a sense of playful responsibility."


Random Seed
ANSE-selected Seed (Ours)

Text Prompt: "In a charming Parisian café, an animated panda sits at a quaint wooden table, sipping coffee from a delicate porcelain cup. The panda, wearing a stylish beret and a striped scarf, gazes out the window at the bustling Paris streets, where the Eiffel Tower looms in the distance. The café's interior is adorned with vintage posters and warm, ambient lighting, creating a cozy atmosphere. The panda's expressive eyes reflect contentment as it enjoys the rich aroma of the coffee. Outside, the cobblestone streets and flower-adorned balconies add to the enchanting Parisian scene, making the moment feel both whimsical and serene."






Comparison on CogVideoX-5B

For Comparison, we definitely use same prompts and model for video generation.



Random Seed
ANSE-selected Seed (Ours)

Text Prompt: "In a magical forest bathed in dappled sunlight, a charming koala bear sits at a grand piano, its furry paws gently pressing the keys. The koala, with its soft grey fur and expressive eyes, wears a tiny bow tie, adding a whimsical touch. Surrounding the piano, vibrant flowers and towering trees create a lush, enchanting backdrop. As the koala plays, the melody seems to harmonize with the rustling leaves and distant bird songs. The scene captures a surreal blend of nature and music, with the koala's serene expression and the forest's tranquil beauty creating a captivating, dreamlike atmosphere."


Random Seed
ANSE-selected Seed (Ours)

Text Prompt: "A majestic elephant strolls gracefully through a lush, verdant forest, its massive feet gently pressing into the soft earth. The sunlight filters through the dense canopy, casting dappled shadows on its wrinkled, grey skin. The elephant's trunk sways rhythmically, occasionally reaching out to touch the vibrant foliage. Birds chirp melodiously in the background, adding to the serene ambiance. As it walks, the elephant pauses to drink from a crystal-clear stream, its reflection shimmering in the water. The scene captures the essence of tranquility and the natural beauty of the elephant's peaceful journey through its habitat."


Random Seed
ANSE-selected Seed (Ours)

Text Prompt: "In a whimsical forest clearing, a raccoon with a mischievous glint in its eye stands on a tree stump, holding an electric guitar. The raccoon, wearing a tiny leather jacket and sunglasses, strums the guitar with surprising skill, its tiny paws moving deftly over the strings. The background features tall, ancient trees with sunlight filtering through the leaves, casting a magical glow. As the raccoon plays, woodland creatures gather around, entranced by the unexpected concert. The scene captures the raccoon's rockstar moment, blending nature's tranquility with the electrifying energy of its performance."


Random Seed
ANSE-selected Seed (Ours)

Text Prompt: "A sleek, metallic robot DJ with glowing blue eyes stands on a neon-lit rooftop in futuristic Tokyo, surrounded by towering skyscrapers adorned with holographic advertisements. The night sky is illuminated by vibrant, pulsating lights, reflecting off the rain-soaked surfaces. The robot, with intricate circuitry and mechanical arms, expertly manipulates the turntables, creating an electrifying mix. Heavy rain pours down, adding a dramatic effect as the droplets sizzle on the robot's exterior. The scene is a blend of sci-fi and fantasy, with the cityscape's cyberpunk aesthetic enhancing the surreal atmosphere. The robot's movements are precise and rhythmic, embodying the fusion of technology and artistry in this captivating, rain-drenched night."


Random Seed
ANSE-selected Seed (Ours)

Text Prompt: "A determined individual in a sleek black tank top and gray athletic shorts performs push-ups on a pristine wooden floor in a minimalist, sunlit room. The camera captures the sweat glistening on their forehead, emphasizing their intense focus and dedication. As they lower themselves, the muscles in their arms and back ripple with effort, showcasing their strength and endurance. The room's large windows allow beams of natural light to highlight their form, casting dynamic shadows that accentuate each movement. The serene ambiance of the space contrasts with the vigorous exercise, creating a powerful visual of discipline and perseverance."


Random Seed
ANSE-selected Seed (Ours)

Text Prompt: "A lone zebra gallops across the vast African savannah, its black and white stripes a striking contrast against the golden grasslands. The sun casts a warm glow, highlighting the dust kicked up by its hooves. In the distance, a herd of zebras grazes peacefully, their ears perking up at the sound of the approaching runner. The lone zebra's muscles ripple with each powerful stride, its eyes focused and determined. As it nears the herd, the zebras lift their heads in unison, welcoming the newcomer. The scene captures the essence of unity and the wild beauty of the savannah, with the herd now complete under the expansive, azure sky."






Impact of BANSA Score

We compare outputs generated from a randomly sampled seed (left), the seed with the highest BANSA score (middle), and the seed with the lowest score (right), using the same prompt and model. ANSE-selected Seed (Ours)s produce more coherent structure, stable motion, and stronger semantic alignment than both random and high-uncertainty seeds.


Random Seed
 
BANSA Score (Worst)
 
BANSA Score (Best)
(Ours)

Text Prompt: "A majestic giraffe, its long neck gracefully arching, bends down to drink from a serene river, surrounded by lush greenery and tall grasses. The sun casts a golden glow, highlighting the giraffe's patterned coat and the gentle ripples in the water. Nearby, a family of zebras grazes peacefully, adding to the tranquil scene. Birds flutter above, their reflections dancing on the water's surface. The giraffe's delicate movements create a sense of harmony with nature, as the river flows gently, reflecting the vibrant colors of the surrounding landscape."


Random Seed
 
BANSA Score (Worst)
 
BANSA Score (Best)
(Ours)

Text Prompt: "A lone bicycle, with its sleek frame and black tires, glides effortlessly through a vast, snow-covered field under a pale winter sky. The rider, bundled in a red parka, black gloves, and a woolen hat, pedals steadily, leaving a delicate trail in the pristine snow. The scene captures the quiet serenity of the landscape, with snowflakes gently falling and the distant silhouette of bare trees lining the horizon. The bicycle's tires crunch softly against the snow, creating a rhythmic sound that complements the peaceful ambiance. As the rider continues, the sun begins to set, casting a warm, golden glow over the snowy expanse, highlighting the beauty of the winter journey."


Random Seed
 
BANSA Score (Worst)
 
BANSA Score (Best)
(Ours)

Text Prompt: "A sleek, modern airplane, painted in a striking blue and white livery, taxis down the runway of a bustling airport, engines roaring with power. The camera captures a close-up of the landing gear lifting off the ground, followed by a wide shot of the aircraft ascending against a backdrop of a vibrant sunset, with hues of orange, pink, and purple painting the sky. As the plane climbs higher, the cityscape below becomes a mosaic of twinkling lights, and the horizon stretches infinitely. The final shot shows the airplane soaring gracefully into the clouds, leaving a trail of vapor against the twilight sky, symbolizing the beginning of a new journey."