VidMuse Professional User Guide & FAQ

1. What types of videos can VidMuse create?

VidMuse is a professional "audio-native" video generation tool. Its core capabilities cover three major areas: Music Videos (MV), narrative short films, and commercial advertisements.

Professional MV Production (Core Feature): Supports uploading local audio files or external links. The system automatically analyzes musical rhythm and emotion to generate artistic MVs with a master-director perspective.
Diverse Cinematography: Capable of precisely generating complex shots such as singing (lip-syncing), dancing, dramatic performances, and visual effects, achieving perfect synchronization between visuals and sound.
Narrative & Commercial Applications: Beyond music videos, VidMuse excels at long-form plot-driven videos and narrative shorts. It also includes built-in TVC (Television Commercial) and Ad modes to meet high-quality commercial delivery requirements.

2. What is "Style"? / How do I choose a style?

Visual style is a crucial factor affecting final video quality. Essentially, different styles generate specific downstream style prompts that influence reference images, the first frame of every shot, and overall video performance.

The system will recommend 9 styles you might like based on your needs from our curated library. You can select one, request a new batch, or—most importantly—describe your desired visuals to customize your style.
Before customizing, understand that Style selection only affects the visual aesthetic; it does not influence editing pace or acting style.
Style descriptions should be macro-level (e.g., artistic movements, genres, or specific artist influences) rather than non-consensus or non-systematic terms like "Vidmuse style 🐶" or "Sand ai style 🐶."
Avoid specific shot-level requirements in the style description, as they may negatively impact downstream storyboard design. For example:
❌ Poor Prompt: "I want a Wes Anderson style with lots of centered and symmetrical compositions and a pink/green palette." (This might force every shot to be centered and pink/green).
✅ Professional Example: "I want a style inspired by [Art Movement/Artist]. High contrast, [specific tone], color palette dominated by gradients with [specific] saturation. Use natural lighting and highlight film grain."
✅ Beginner Example: "I want a visual style that expresses [specific feeling], similar to the movie [Movie Name]."

In short: Use this step to define the macro visual atmosphere of the entire film.

3. Why doesn't the model understand my style?

If the model fails to capture your intent, it is usually because the description violates macro principles or hits current technical boundaries. Check for these three common pitfalls:

Pitfall 1: Using private, non-consensus vocabulary

The model cannot access your personal memories; it operates on public internet consensus.

❌ Incorrect: "Vidmuse style," "Sand AI style," "Like the video I made yesterday."
Reason: AI cannot understand definitions not recognized by the public or art world, nor can it track your personal history in this context.
✅ Correct: Use universal terms (e.g., Impressionism), famous directors/artists (e.g., Van Gogh, Wong Kar-wai), or specific visual adjectives.

Pitfall 2: Including specific shot instructions in the style description

This is the most common error. Style prompts apply to the entire film, while shot instructions belong in the storyboard.

❌ Incorrect: "Wes Anderson style with lots of centered and symmetrical compositions."
Consequence: If "centered composition" is in the global style, the system forces every shot to be centered, leading to repetitive, rigid visuals that break narrative logic.
✅ Correct: Define macro features (e.g., high-saturation candy colors, flat lighting, vintage film texture) and leave "symmetrical composition" for the storyboard script.

Pitfall 3: Over-relying on reference images for "style replication"

Current Limitation: VidMuse's style transfer is iterative. The model currently cannot perform deep, pixel-level replication from a single image.
Reality: Uploaded images serve as a rough visual guide, not a strict template. Results may not 100% replicate details and may show instability across shots. Precise text descriptions remain the priority.

4. Why is the style inconsistent with my expectations?

Style deviations usually stem from prompt conflicts or misuse of reference images. Check these three areas:

Match between image description and style If using a recommended style but the visuals feel "off," it is often because the individual shot description (Prompt) isn't specific enough for the style to latch onto objects.

Solution: Optimize the storyboard prompt. Instead of "a girl," use "a girl in a red jacket standing under neon lights." Richer details help the style render better.

Optimize the reference image path (Crucial) If you upload a reference photo in "Style Configuration" but the video fails to replicate its details:

Reason: Global style references are for macro tones.
Solution: Use the [Upload Reference Image] feature within the Storyboard Canvas. Uploading images for specific shots allows the model to lock in composition and style more accurately.

Check for "Strong Style" keyword interference Check if your style description includes aggressive texture keywords (e.g., watercolor, thick oil painting, 3D modeling).

Consequence: These apply globally. It may make realistic scenes look like "models" (due to 3D keywords) or blur intended sharp shots (due to "oil painting" keywords). If you don't want the effect everywhere, remove the keyword or specify its range.

5. There are flaws in my storyboards/videos. Why can't I fix them?

Ineffective edits usually occur because instructions are too vague or the wrong tool is used. Stop using general dialogue and adopt these precise strategies:

Efficient Modification Strategies (Recommended)

Specify Shot Number (Core): You must tell the AI exactly what to change. Use the format: Shot Number + Problem + Target.
✅ Correct: "The background of Shot 5 is too dark (Problem); please change it to a sunny outdoor setting (Target)."
Use Remix for Local Fixes: If you like most of the frame but have small flaws (e.g., hand structure, facial tweaks), do not regenerate the whole image. Use the Remix (In-painting) tool to mask the flaw and regenerate only that area.
Introduce Visual Guidance: If text fails to fix a structure, switch generation models in the canvas or [Upload a Reference Image] with a similar composition.

Inefficient Pitfalls (Avoid)

Vague Dialogue: Never send instructions like "I don't like this" or "Make it look better" without a shot number.
Consequence: The AI won't know which image to modify and may try random global changes, wasting your credits.

6. How do I maintain character consistency?

Character consistency relies on reference images, while video stability is limited by current generative technology.

Image Generation Stage (Appearance Inconsistency)

Enable Reference Constraints: Check if "Reference Character" is enabled. Upload a clear standard photo of the character in the dialogue or canvas.
Troubleshoot Style Interference: Highly artistic styles (e.g., abstract, heavy oil) naturally weaken facial features. Try a more realistic style to test consistency.
Clean the Prompts: Ensure the prompt isn't overloaded with unnecessary modifiers that distract the model from core features.

Video Generation Stage (Dynamic Inconsistency)

Technical Reality: Facial flickering or limb distortion during motion is a current industry-wide bottleneck.
Optimization: Simplify shot design. Reduce extreme movements or high-speed pans. Stable camera movements yield more stable visuals.

7. I spent too many credits and didn't get a good result. Can I get a refund?

No.

Policy: Credits are digital virtual services. Once topped up or consumed, they are non-refundable.
Technical Faults: If credits were wasted due to system crashes or server errors, contact technical support for verification.
Contact: Please visit our official Discord community.

Link: https://discord.com/invite/CahCmBCDgE

8. When does VidMuse consume credits? Consumption feels too fast—can I slow it down?

VidMuse only deducts credits when performing substantive "Generation Tasks". All conversational interactions and dialogue with the AI are free of charge.

Billing Stages (Credits Consumed)

Credits are consumed only during the following five core production stages:

Analyze Audio: Uploading music for rhythm and emotional tone analysis.
Analyze Video: Uploading reference videos to analyze shot compositions, music, and other metadata.
Generate Images: Producing storyboard frames or reference images.
Generate Video: Rendering image-to-video transitions.
Generate Audio: Utilizing specific audio processing or generation functions.

Free Components (No Credits Consumed)

Creative Communication: Any text-based dialogue, discussion of revision suggestions, or script refinement conducted in the chat box is entirely free.
Idea Input: Regardless of how complex your creative descriptions are, no credits will be deducted as long as you do not click the "Generate" button.

9. Can I change the model?

Yes. You can switch models based on quality or cost needs:

At Project Creation (Global)

Studio Mode: High-level detail. Avg. 10–12 credits/min.
Lite Mode: Cost-effective. Avg. 9–32 credits/min.
Custom Mode (Members only): Save your preferred default parameters.

While Editing (Local)

In the Canvas, you can manually select a different model for a specific shot to regenerate it.

10. The image is almost perfect, but I want to tweak a tiny detail.

Do not regenerate the whole image. Use the "Remix" feature in the Canvas.

How it works: It keeps the main subject and composition but allows precise modification of a specific area.
Steps: Select image -> Click Remix -> Input modification instruction (e.g., "Change red hat to blue").

11. Can VidMuse add Model X?

Our model library is constantly updating.

We periodically evaluate and launch new models.
Suggest specific models in our Discord community; we take these suggestions into account for our roadmap.

12. Can I generate HD/High-quality videos?

Yes, in two stages:

Creation Stage: Choose Studio Mode when creating the project for better detail and lighting.
Export Stage: Select your resolution. We support 720p (HD) and 1080p (Full HD).

13. How much video can I make with 2000 or 4000 credits?

2000 Credits: Roughly a standard short video (< 2 minutes).
4000 Credits: Roughly a complete medium-length video or narrative short (3–4 minutes).

Note: This varies based on model choice and re-generation frequency.

14. Can you help me modify music? / Can VidMuse generate music?

VidMuse supports AI music generation but not editing existing music.

Generation: Input style/mood (e.g., "Upbeat Pop" or "Sad Cinematic") to generate high-quality BGM.
Modification: We currently do not support mixing or audio editing. Please use professional audio software before uploading to VidMuse.

15. Why is character consistency poor? (Deep Dive)

If faces change between shots, follow these steps:

Reference Quality: Avoid complex lighting or difficult angles in your reference photo. Use a clear, front-facing portrait.
Style Interference: Avoid "High Risk" styles (Abstract, Ink Wash, heavy Oil) if facial accuracy is the priority. Use Realistic or 2D Anime styles instead.
Prompt Consistency: Use fixed tags (e.g., "short-haired girl in red jacket") in every shot involving that character.
Accept Technical Boundaries: Minor facial changes in high-motion shots are normal for current AI. Use medium or long shots to hide imperfections.

16. How do I get credits?

Top-up Bonuses: All "Add-on packs" now have a 50% bonus. (e.g., $10 gets you 1200 credits instead of 800).
First-time Subscription Gift

Pro Plan: +1,000 credits.
Studio Plan: +5,000 credits.

Referrals

Friend registers & generates: 500 credits.
Friend buys Pro: 2,000 credits.

17. Can I change the video resolution after creating a project?

No.

Locked Parameters: The video generation resolution is locked upon initial project creation. If you initially select 720P, all subsequent model rendering will be based on this parameter; it is technically impossible to upscale the resolution later.
Recommendation: If you require a high-definition final product, we suggest selecting your target resolution directly when creating the project. This ensures the generated video results better align with your expectations.

18. Can I change the audio duration or make major structural changes to the video midway?

Strictly Not Recommended.

Once a project is initialized, the core rhythm analysis and timeline structure are established based on your initial input.

Risk of Sync Failure: Forcing a change to the total duration of the audio or video during the editing process will disrupt the synchronization between the visual rhythm and the soundtrack, leading to alignment errors or generation failures.
Best Practice: If you need to make significant modifications (such as changing the song duration, replacing the audio file, or creating a completely different version), please create a new project. Starting fresh ensures the AI correctly analyzes the new timeline and rhythm from the start.