Timeline and Audio syncing issue

Hello again,

I’m having issues in regard to audio getting slightly behind animation. Specifically if after 4-5 seconds of animation a word was spoken, the animation would come later to mouth said spoken word. If I tried to sync it up in Pencil 2D, the exported result would be the audio continually lagging behind said visual speech.

Cause of problem I suspect is processing. I’ve noticed if you pause the timeline anywhere after 2-3 seconds and play from the point it stopped, the audio backtracks a little, which in my opinion points to the fact that I believe Pencil 2D isn’t processing it very well. Any thoughts?

Here’s a couple videos I threw together demonstrating whats going on. the first is a screen recording of inside Pencil 2D and the other is an unaltered export of the same file. before recording I had altered the timing of animation in order for speech to match up in the exported video. :stuck_out_tongue:

Recording of Pencil 2D Studio

Exported Result

Thank you in advance.

Hi @thealextdb, First let me thank you for providing an example of the problem, as it’s quite hard to distinguish sound sync problems without proper content to check through.

I’ll add it to our issues on github, then we’ll look into it.
We currently have a guy working on a new video exporter which hopefully should provide better exports, I don’t know whether this will improve sound processing though.

Alright sweet! It’s not a terrible issue, unless you’re trying to lip sync… which in my case is terrible but I’m trying to make it work for now. I just wanted to send this out :stuck_out_tongue: hope things work out with the new video exporting!

@thealextdb Hey man, cool work. I did a test on my own, and I’m not sure if the issue is actually the software. I work in animation so I’m used to do a lot of manual lipsync for cut-out animation.

Thing is, since we don’t have a waveform visualization and a sound scrubbing feature in Pencil2D it’s actually very difficult to hit the right syllable at the RIGHT time. You have to do a lot of playback and even so it will be slightly inaccurate.

Although this might be a bug and Candyface already reported it, here I link a test I did with another audio sample. I didn’t have the time to do mouths due to my job, but I made a text animation synced to the sound, and I got what I wanted it by adjusting the timing on the frames accordingly.

When doing lip sync there’s a rule of thumb where due to the way sound propagates, you have to draw the keyframe at least 1 frame BEFORE the actual sound, because light travels faster than sound, your sight laggard gets the visual before the audio.

What I’m trying to say here is, even though it might be a bug, It can also be that doing the sync without a visual cue (waveform) and an auditive cue via scrubbing it is possible that you only have to move the frames a bit early to get the lipsync you want.

I would like to ask if you can upload the file to review it though. If you can’t that’s ok, but it’s only to rule out that possibility.

It also might be that the exporter is not working properly, encoding the audio a few frames late, or the bitrate of the video + cpu usage is worsening the conversion. So my second advice is, once you think the lipsync it’s fine export the video and join it with the audio sample on a dedicated video editor and export it there.

Thanks for your reply! Yeah its not that big of an issue for me, its more just tedious as I try to line things up, check the exported result and see if it lines up. As of now the timeline scrubbing feature isn’t very reliable so, its workable at the moment. Processing isn’t an issue, I built my pc to run 3D animation and run high processing applications… so thats out of the question.

Sure, I can upload the workfile. Does it need the additional files as in the art/audio as well to work properly?

The PCLX file should contain everything needed, so no additional files are required :slight_smile:

Ok sweet, here’s a link then :smiley:


@thealextdb @candyface Ok guys. Indeed what’s happening is part bug and part timing as well.

Regarding the bug, for CandyFace:
It seems that when playing from the first frame, the sample with play correctly, but once you begin playing from any other frame, the sound will get displaced in time (it’s not a true frame by frame sound playback)

On export it’s the start is lagging for about 8 frames. So the last frame on the sameple does not correspond to the last frame on the editor.

Here’s the PCLX file I edited:

Regarding the timing for Alex:
I had to mutilate your inbetweens to work faster since my time is very limited, sorry, but I managed to get most of the timing down. Had to move all the chunks past “hello guys” to the left and time it some more to verify the issue. The sound does get cut by export, but most of it is correct sans a few frames.

So, for now, to really get your sound just right, you have to playback from the start. You can play from any frame, but always bear in mind the editor will be playing mind tricks on you and you HAVE to move the frames back a bit.

So to work out with these issues for the moment here’s an idea:

  1. Playback the whole sound first
  2. draw stuff
  3. playback from time
  4. draw some more
  5. playback from frame 1
  6. check inaccuracies in timing and correct them (x2)
  7. move frames accordingly
  8. playback from frame 1 again
  9. repeat ad infinitum.

For now it’s the only way I see you can avoid Pencil2D’s problem.

On another note regarding the animation process itself here’s a friendly advice. Don’t inbetween until you have the main phonemes broken down and the timing singled out for the whole sample. When supervising animation I normally tell the animators to do the following:

First break down the vowels. The OO’s and AA’s and all that. Once you got that rolling, and if it isn’t enough, go ahead and breakdown only the most important consonants, those that will flesh out the words. We are creating an illusion.

For example, your first sentence “hello guys” is really smooth and well done. As an opener is perfect, but if the rest of the animation looks the same it will lack contrast and the animation will lose focus no matter how smooth it looks. So it’s better to emphasize based on the sound sample.

For example. “It’s been a whil(E) (emphasize the E) since I p(O)sted any(thi)ng h(e)re” Those are your keys, you don’t really have to draw every single vowel and letter, even for Disney quality.

Think of it more like you’re building phonetic bridges, if that makes sense. So you really go and draw the main letter drawings like this:

“it’s been a while since…”
“Et-s bE–n A U-AEL SEN–…”

The second line represents the LETTER DRAWING, that is how it looks when it sounds. This is what it’s called “visemes” or visual phonemes.

Once you have that structure you can get the buttery smooth sweet baby jesus inbetweens.

Hope this helps and keep up the great work! :slight_smile:

Wow, thank you so much Jose! I really appreciate the time and advice you put into this for me, it means a lot. I agree, that does all make much more sense and I can see that working now… I’ve never been trained in Animation so all my knowledge is more from watching animation channels on YouTube. I’ll definitely be saving this thread!

I think as classes are starting back up for me, I’m not going to be able to spend as much time working on animating… so for now I’m planning on making things a bit simpler, unfortunately. I won’t be doing lip animations just yet but its something I want to start doing as soon as I can, and I’ll keep at it until I’ve practiced enough… I’ve just got to get something done lol.

Again, thank you all so much for your help, Especially Jose!



I have reworked the way milliseconds is processed per frame, so there should be no more time displacement. I’ve used your project to test this myself and it seems to work. You can checkout an experimental build here: https://drive.google.com/open?id=0B3Jkb3m7pLp4QkRPbF9oWTR3Mlk

There are numerous improvements to sound playback in general in this build, namely:

  • Playback will continue to the end of the soundtrack instead of the last keyframe.
  • Multiple tracks will resume if stopped, if they overlap.
  • Playback will reset when the “Start” arrow has been clicked.
  • Milliseconds per frame has been reworked.

so if you have time, please test it out and report if you have problems :slight_smile: