The Magic of Tesla FSD, part 6 (Robotaxi edition) - (by Phil Beisel)

AutoModerator 2025-07-25 10:39

**I am a bot. This is a friendly reminder that unwelcoming toxic/griefing/pessimistic sniping comments that are not on topic and don’t move the discussion forward will be removed. A ban will be issued if necessary. Consider this before commenting. Report posts or comments that violate the [Rules](https://www.reddit.com/mod/teslamotors/rules/). Thank you.** If you are unable to find it, use the link to it. We are not a support sub, please make sure to use the proper resources if you have questions: [Official Tesla Support](https://www.tesla.com/support), [r/TeslaLounge](https://www.reddit.com/r/TeslaLounge/) personal content | [Discord Live Chat](https://discord.gg/tesla) for anything. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/teslamotors) if you have any questions or concerns.*

twinbee 2025-07-25 10:43

For those who don't have access to X, here's an archive of the article: https://archive.is/Qbzvy Here's a small taster to show the kind of depth the article delves in to: > **Photon Counting** > First we need to understand what vision means to Tesla. It’s not what your eyes see— it’s what the camera sees, and that’s something very different. > A traditional digital camera processes raw input through a series of steps to produce a human-viewable image. This process discards massive amounts of data, compresses dynamic range, and biases the result toward human color perception— especially green. > Tesla flips this on its head. It feeds raw 12-bit Bayer mosaic data directly into its system and biases the sensor toward red, where road-relevant signal lies. Tesla calls this “photon counting,” and the name fits— it’s raw photon measurements, unfiltered. That raw data goes straight into both training and inference. The images you or I might see? They're never even produced. The neural network sees the world not through a processed image, but through the sensor’s raw output—photon-level fidelity. > Here’s the key point: Tesla doesn’t let the camera process the image. It lets the FSD neural network process the image. The network itself becomes the filter, deciding what matters—not based on what looks good to a human eye, but based on what leads to safe, confident driving. It’s outcome-driven perception. That’s a radical shift. And it changes everything.

Rollertoaster7 2025-07-25 10:48

Fascinating read

twinbee 2025-07-25 10:52

I loved the last paragraphs of the article: > All engineering solutions involve tradeoffs. But often, the winning approaches are the ones that embrace simplicity. > Tesla’s approach mirrors how humans drive: our senses are imperfect, yet our brains allow us to operate vehicles effectively (or reasonably so). That’s the bet Tesla is making too—not that autonomy is simple, but that its complexity is best concentrated in the brain. Strip away the hardware redundancy. Strip away the sensor fusion logic. Let vision be the input, and intelligence do the work. Simple where it can be. Advanced where it must be.

Beastrick 2025-07-25 11:17

They are making a big deal of the brain over hardware which obviously is important but I don't understand why are we seemingly okay with approach which never will be perfect. Like if you are blind no amount of brain power will compensate that. In FSD case say thick fog that limits visibility. The system will always be imperfect in those conditions because it can't see clearly. Of course being better than human is already system that will be very beneficial to society but shouldn't the ultimate goal be 0 mistakes and deaths? Of course that might be something that will take decades to achieve but should there at least be path to that instead of us settling with "better than human" level?

AJHenderson 2025-07-25 11:20

Well that's a bunch of buzz word bingo for "Tesla does nothing special". This is how processing this kind of data works, not something special. It's literally just raw image data and the system does also process it for the camera recording and for turn signal and reverse cameras, but it's true that isn't used by FSD. It should be absurd for a computer vision system to not work on raw data as it's a completely unnecessary step to try to generate a human dynamic range image. This entire block of text is just a flowery way of saying Tesla does the same thing as everyone else, but make it sound futuristic to the uninitiated.

twinbee 2025-07-25 11:22

Other companies go with RCCC over RGGB? He continues: > Second, Tesla eliminates the ISP entirely. The raw 12-bit RCCC data is fed directly into the FSD training and inference stack—no demosaicing, no color correction, no dynamic range compression.

AJHenderson 2025-07-25 11:23

Tesla doesn't strip away the hardware redundancy and doing so in a safety critical system would be insane. The sections of this article you are highlighting are absolute dog shit. It does get better in other parts, but leaves out a lot of key detail, such as the fact only one forward facing camera is rccc. (Otherwise it would not be possible to make color images for the dashcam). There's not really a whole lot of interesting or new information in the write up. It does present a decent overview but it's written like a marketing document not a serious introduction to the technology.

cookingboy 2025-07-25 11:32

Just took a quick look, and unfortunately I was shaking my head quite a bit already. Who is the author? First impression is that this is a very non-technical piece with just enough detail aimed to impress regular joes, but doesn’t have a lot of meat in it. For example: > Here’s the key point: Tesla doesn’t let the camera process the image. It lets the FSD neural network process the image. The network itself becomes the filter, deciding what matters—not based on what looks good to a human eye, but based on what leads to safe, confident driving. It’s outcome-driven perception. That’s a radical shift. And it changes everything. There is nothing radical about using NN to directly process data from the camera sensors, and more importantly, the author doesn’t explain *why* this is a superior approach or what problem does it solve? At the end of the day visual identification has been a **solved problem** in the industry. What does Tesla bring to the table here? Then there is the case of the author bragging about novel solutions to **dumb problems created by Tesla in the first place**. Sure you can use Lidar to help your NN to train localization using cameras. But why not just use a cheap Lidar in the first place and save you literally a decade in development time? And don’t give me that “sensor fusion is difficult” bullshit. Any decent EE student doing their senior design project would laugh their ass off at such an idiotic statement. Again, the reality is that **localization has been a solved problem for over a decade now**. Then the rest of the article has things like this: > Tesla’s simulator is a highly sophisticated software system. It works from the original sensor data, augmenting it with physically accurate, realistic changes—so realistic, in fact, that a human observer would not be able to distinguish the simulated version from an original recording. It’s a generative AI engine purpose-built for Tesla's autonomous driving validation. Remember when Waymo talked about their simulation system 10 years ago all the Tesla fans were laughing saying how Tesla doesn’t need simulation because it has real world AP data (I argued so many times on this sub about how absurd that is)? Now the author is talking about Tesla’s simulation engine as if it’s some sort of novel invention? I’m sorry, at the end of the day my take on the FSD program hasn’t changed in 10 years, which is that the whole thing was led astray by Elon’s ego and lack of understanding about basic engineering principals. The whole vision only approach solves a problem that doesn’t even exist in the first place, while their approach to everything else is exactly the same, following Waymo’s footsteps. He’s not an engineer and he has never worked as an engineer, but he just knows enough about his subjects to fool complete outsiders.

AJHenderson 2025-07-25 11:37

Only one Tesla camera is rccc (the "black and white" wide angle front one) and yes, many other industrial computer vision applications use custom filters or no filter at all and generally don't have ISPs before the computer vision. ISPs are just translators from camera to human. They are destructive and not relevant to computer vision.

cookingboy 2025-07-25 11:39

This is one of the most frustrating statement that gets repeated over and over again. Why would you want to mirror how human drives? Using the human body, which was the result of millions years of evolution that has nothing to do with operating a 4000lb machine at 80mph, as the **benchmark** for a car’s control system, is absolutely *beyond* idiotic. That’s like saying you want to build an AI fighter to mimic how human pilots fly. No, you want to get rid of the human so you can *far surpass* human capabilities. Thats what sensors are for, that’s what super computers are for, why limit yourself to vision and human behavior imitation? Would you get rid of the radar and IR sensors on an F-35 because the human pilot can fly a biplane with just eyes? This is the whole problem right? Elon chose this approach *intuitively* out of his ego and his “sense” of what is right, when over the entire decade his engineers within Tesla has been telling him he’s wrong (and he fired most of them). That’s what happens when you try to solve an engineering problem with pure intuition.

AJHenderson 2025-07-25 11:42

Agreed, though my personal take is that we can wait to get vision only as far as it can before introducing a crutch. I'll be severely disappointed if they never go sensor fusion but having the best vision system around will make fusion easier and more reliable later.

twinbee 2025-07-25 11:45

Definitely sympathize with your point than we want to advance past human inadequacies, but my takeaway from the quote was more in line with the Alpha Zero approach. More of a "Don't try to be too clever, let the AI brain do all the work".

[deleted] 2025-07-25 11:46

“Ready to scale exponentially”. Not from the footage I’ve seen. Only 7000 robotaxi miles and there’s already multiple videos of it almost causing accidents or illegal behaviour without the safety monitor’s intervention.

cookingboy 2025-07-25 12:00

> let the AI brain do all the work There is no generic ready-made “AI brain” that can just “do all the work”. Tesla has to train its AI to solve vision problems because they are too cheap to give it better input from other form of sensors. Playing Go doesn’t have any problem aspect that deals with sensing and perception, which is *very* different from driving. Btw the people I know at DeepMind (the guys who made Alpha Zero) all feel pretty bad for the engineers under Elon.

twinbee 2025-07-25 12:16

Do you know if any of the guys who made Alpha Zero respect Elon? I think we should let Elon's engineers speak for themselves. FSD head certainly respects Elon.

rawasubas 2025-07-25 12:18

I don't understand how could the NN get the C data - the green and blue filter still needs to be physically present before the CMOS sensors because the dashcam footage still has the full color video. So it sounds like the GGB data just get summed into into one channel. So essentially the NN has a 12-bit R channel and a 14-bit C channel as the input?

Intrepid_Walk_5150 2025-07-25 12:19

In engineering, sometimes (often) redundancy is simplicity. Having multiple systems which complement each other is usually more robust than trying to push one system beyond its limits. That's a big part of how commercial planes got so safe. Let Elon "simplify" redundant safeguards in SpaceX rockets and find out what will happen.

Ernapistapo 2025-07-25 12:32

In those conditions humans shouldn’t be driving either. It’s not just about the capabilities of the vehicle, but the vehicles around you as well. You still need to interact with vehicles driven by humans with human limitations. From the footage I’ve seen, the camera system today is more capable than human vision. Thick fog and heavy rain also affects LiDAR sensors as the water droplets scatter the laser In many directions. None of the systems are perfect.

AJHenderson 2025-07-25 12:35

It's the one wide angle front camera that is rccc. That one only renders out as black and white. The rest obviously can't be rccc. In fairness the fact they had a filter at all on that was new to me, I just assumed it was a black and white industrial camera but having a red pixel makes sense for highlighting things that are red in the scene.

rawasubas 2025-07-25 12:37

There's no way the raw data is sent back to the data center to train the AI right? That would be too much data. Even storing that amount of raw data on the car could be too much. Just a minute of the front wide angle HW3 camera (1280*960) is 4.8gbyte. Surely the data is processed by a few layers of NN onboard before being sent to the data center to train further?

AJHenderson 2025-07-25 12:39

That's an interesting question. I know they do data selection on what they send back and it sends a LOT of data. I get hundreds of gb uploaded a month from my Teslas. (452gb this month from my M3P and 168gb from my wife's MYP.) I wonder if they send a compressed version and then it requests particular frames back. That's probably how I'd do it.

AJHenderson 2025-07-25 12:46

That's why I prefer mm radar rather than lidar as a secondary sensor personally. More bang for your buck in terms of additional sensing capability. The validation overlap is nice for less mature systems with lidar but Tesla has a very good vision system so I'd argue mm radar is a better supplement.

Ernapistapo 2025-07-25 12:49

Ultimately these systems cannot read signs or road markings. They are great at measuring distances, which is important, but only a piece of the vision puzzle. You can’t drive on radar or LiDAR alone, so the system is degraded as soon as vision cameras can no longer see.

No3047 2025-07-25 12:57

Vision only can be as good as a human . Just add a radar and you'll have superhuman performance. But if it's snowing hard or thick fog the car just can't drive, it's crazy to pretend it will work in any condition.

rawasubas 2025-07-25 12:59

It definitely makes sense to mimic human driving logic. You don't want the car to behave unexpectedly (no matter how optimal it might be), because other drivers on the road might be affected by it.

AJHenderson 2025-07-25 13:05

Sure, but much less so than a system without it. And if they ever give the system memory from other cars, that memory can provide a lot of missing data as long as there's enough to go on visually.

AJHenderson 2025-07-25 13:07

Snow will screw up both lidar and radar. I don't know anything that can see through heavy snow well. Fog though you can absolutely travel faster if you have the general knowledge that nothing is in your path as long as you have bare minimum visibility.

djao 2025-07-25 13:10

Are there any other cars on the market today, that regular people can purchase, which use lidar to achieve autonomous driving? I am specifically excluding Waymo; I want a car, not a service for hire. As far as I know, the answer is no. Some manufacturers have capabilities matching Tesla Autopilot (not FSD) using lidar, and many manufacturers have plans for future lidar equipped vehicles, but no one has present day lidar vehicles matching the capability of FSD.

djao 2025-07-25 13:13

Well, clearly the Tesla is not an exact mimic of a human. FSD doesn't use two binocular cameras on a swivel. It uses 8 or 12 stationary cameras pointed in all directions. It is reasonable to shut down the vehicle if outside conditions are unsafe for human driving, unless you know for a fact that you are the only vehicle on the road in that time and place and no other human drivers will be interacting with you if you were to operate the vehicle instead of shutting it down.

fortytwoEA 2025-07-25 13:16

Using their massive offline compute they can undoubtedly generate reasonably accurate "raw" versions from compressed video, using small amounts of raw frames and a compressed full length video. Doesn't even have to be an explicit output, but handled in the latent space of their offline pipeline. This way of training with leveraging orders of magnitude more powerful offline networks that aren't constrained by the realtime&embedded aspects of inference within the car has already been done in other parts of their entire AI pipeline, so wouldn't be surprised one bit if it's used for the data handling as well.

AJHenderson 2025-07-25 13:21

That's a bit of a catch 22. The fidelity of generated tween frames could miss key detail and would train into the model any problems with the tweening model.

iceynyo 2025-07-25 13:28

There's a bunch in China. Hope to see them one day...

djao 2025-07-25 13:36

Chinese cars have very pretty designs, outpacing western automakers in this respect, but when it comes to core technologies, China's modus operandi is to copy and plagiarize western companies, and Chinese companies rarely exhibit leadership in this space.

InertState 2025-07-25 14:09

Is the magic in the room with us now?

InertState 2025-07-25 14:10

What about it do you love? Be specific

yhsong1116 2025-07-25 14:16

they are expanding the service this weekend. 1 month after Austin. so it's pretty fast. does that mean exponential? that remains to be seen.

[deleted] 2025-07-25 14:23

They’re expanding a service. It’s not a robotaxi because it obviously is not ready to be operated without supervision. This service isn’t gonna expand exponentially let alone make money until the get the tech right. Which they clearly haven’t. It would be one thing if it was a bunch of edge cases people are reporting. Some of these are major traffic violations that would be dangerous if the car was fully unsupervised.

yhsong1116 2025-07-25 14:25

Sounds just like Waymo to me

[deleted] 2025-07-25 14:32

Except waymo has over a million miles fully unsupervised and is in multiple cities already and uses lidar and doesn’t have musk’s history of bombast and exaggeration when it comes to “exponential” growth

twinbee 2025-07-25 15:15

The AlphaZero approach I love. I love simplicity as there's too much complexity in the world where there shouldn't be.

yhsong1116 2025-07-25 15:26

Lol there it is

taw160107 2025-07-25 16:50

> Who is the author? LOL The author is Phil Beisel He worked for 6 years at Rivian and founded the technical team there. He worked 10 years at Apple before then.

SchalaZeal01 2025-07-25 17:08

> Chinese cars have very pretty designs, outpacing western automakers in this respect Legacy automakers want to make electric cars kinda ugly, to discourage buyers. Or at least they did for decades. Generally its the exception that doesn't look weird for no reason. At least the Cybertruck went for weird for impact and nostalgia (looks like a DeLorean), and not because "well electric cars are weird". See the stone cutters in The Simpsons where they shittalk parody electric cars as those golf carts with no speed or autonomy that look fugly. That was long before Tesla was there.

yetiflask 2025-07-25 17:54

WTF is this? Someone has to do the difficult, that you might think is impossible. Imagine if everyone lived life like you - just use what's easy and works. We'd still be hunter gatherers living in jungle. Tesla vision + AI based approach is novel, and I'd take a company betting the farm on trying to make this work. I do have sympathy for people who paid for it, but if you can't take risks gtfo of a Tesla and buy some Euromobile using ancient Bosch driving hardware. That option's out there. But I want one company - ANY COMPANY - to try and tackle the impossible. I know one Chinese company (IIRC Xiaomi) also committed to vision + AI only. So REALLY looking forward to what they can cook. May even beat Tesla, just Deepseek is spanking OpenAI.

Magnus_Tesshu 2025-07-25 17:58

Localization has been a solved problem for a decade, so solved that anyone who doesn't use a cheap lidar for it is just creating dumb problems for themselves? Tesla is the state of the art. Everyone else who is not state of the art, saying Tesla is doing something dumb, is confused. I kind of agree that it is a non-technical piece, though you're an idiot.

Magnus_Tesshu 2025-07-25 18:04

> There is no generic ready-made “AI brain” that can just “do all the work”. Tesla has to train its AI to solve vision problems because they are too cheap to give it better input from other form of sensors. Indeed. Tesla is so dumb! They have only the world's best results! Imagine if they were actually trying, using your proven-to-be-smarter methods! > Btw the people I know at DeepMind (the guys who made Alpha Zero) all feel pretty bad for the engineers under Elon. Yeah, it must suck achieving state of the art in multiple different fields, they must be tired of winning so much

cookingboy 2025-07-25 18:58

Your impression of China is stuck in the 90s. Their core tech in the EV space is *ahead* of ours now. Tesla literally uses CATL batteries in many of their cars worldwide. The Chinese absolutely is a tech leader in many spaces like EV, consumer drones and biotech. The Chinese’s MO has been “do whatever it takes to catch up, including stealing and copying, then do whatever it takes to get ahead”. They are in that 2nd part now.

opsers 2025-07-26 04:09

How does this sound just like Waymo?

opsers 2025-07-26 04:14

That's kind of the point though - no single system is perfect. That's why most other manufacturers are using multiple systems. Waymo for example (and I believe Zoox) uses LiDAR, radar, and cameras. They [have a whole blog post](https://waymo.com/blog/2021/11/a-fog-blog) talking about this very topic.

echoingElephant 2025-07-28 21:25

That is simply a lie. „Photon counting“ is an actual thing. But it isn’t „using raw video input“. They don’t measure „raw photons“. They measure an unprocessed image signal. That’s something very different

yoshiee 2025-11-20 20:45

I just came across this comment, and I'm sorry the guy saying Phil is not an engineer made me LOL. Once in a while I'm astonished at the over-confidence redditors have. But at the same time, not surprised.

taw160107 2025-11-20 21:26

Yeah, so funny armchair reddit engineer saying Phil Biesel is not an engineer.