jamesblonde 12 hours ago

We are seeing more and more specialized query engines. This is a query engine specialized for training pipelines. It is not general purpose - it is for providing batches of training data at workers. It uses Ray for parallelization. The kind of queries you need are random reads (to implement shuffling across epochs), arrow support (zero copy to Pandas DataFrames), and efficient checkpointing.

  • auxten 4 hours ago

    Data operations are increasingly happening near the GPU side to boost efficiency—especially for compute-heavy workflows. Talking about Arrow file processing and zero-copy queries on DataFrames, which are becoming crucial for modern data pipelines. I think another option worth considering is chdb, which supports these features and fits well with this shift. (author of chdb here)

dcreater 32 minutes ago

Confused by the example in the repo? What is the use case for this? Is it a replacement for dask, ray etc? (Not a professional swe)

rubenvanwyk 13 hours ago

May Data Engineering content keep on hitting front page HN!

HackerThemAll 14 hours ago

DuckDB itself is cool enough, especially when combined with SQLite and/or PostgreSQL, and now this. Thanks DeepSeek!

  • dcreater 37 minutes ago

    How is duckdb combined with SQLite? Aren't they alternatives to each other?

shipp02 13 hours ago

Is the code written by the deepseek model?

I should probably give up on being a software engineer if it is.

  • cavisne 12 hours ago

    There is a chinese blogpost from 2019 about 3FS so it predates deepseek [1]. It will be interesting to see the benchmarks but I suspect without 3FS smallpond is not that useful (the bottleneck would move to the networked file system).

    None of the big US clouds support Infiniband broadly (Azure & Oracle have some support) so 3FS itself is not very useful to US companies who want to use public clouds.

    [1] https://www.high-flyer.cn/blog/3fs/

  • breadwinner 13 hours ago

    Give up and become what? Most white collar jobs will be automated in the coming years. You think doctors' jobs are safe?

    • ezst 13 hours ago

      Not OP, but, anything that actually physically affects the real world for the better? For instance, large infrastructure engineering and construction projects are not going to run themselves any time soon. The world doesn't revolve around ad and fin tech.

    • didntknowyou 12 hours ago

      you can already google the information , the majority of a doctor's value is not in their information but their people and technical skills.

      • rscho 12 hours ago

        Well, googling the info is one thing. But today, medicine is still mostly a know-how profession. Residency is there mostly to transmit know-how.

    • nurettin 4 hours ago

      If your white collar job consists of simply using software, like copying numbers you see to an excel sheet, maybe. Otherwise they are pretty safe. People have been building tools and automation for thousands of years, yet nobody invented a fully automated cook for your fancy family dinner.

    • rscho 13 hours ago

      Yes, doctors are safe. Because they do things. With their hands. That no one else does.

      • aragonite 12 hours ago

        > Because they do things. With their hands. That no one else does

        That's only true of surgeons :) What if your specialty is nonsurgical (internal medicine, pediatrics, psychiatry, etc)?

        • rscho 12 hours ago

          Almost all specialties do various technical procedures that only them really know how to do. The extreme is psychoanalytic psychiatry, which are the only ones really doing nothing with their hands (yes, interventional psychiatry is a thing...). Now, you could argue that 'yes, but most of the times it's done by techs/nurses'. Well, no. When things go south, and in all places where there is noone else to do the stuff (of which there are many) docs are on their own.

          Regarding surgery, I expect it to be one of the easiest procedures to automate, actually (still quite hard, obviously). Because surgery is the only case where there's always advanced imaging available beforehand, and the environment is relatively fixed (OR).

        • nurettin 4 hours ago

          Psychiatrists do that triangle shape with their hands.

        • downrightmike 12 hours ago

          Not even true of all surgeons, the ones that make the most money use machines to work on things their hands couldn't do

          • skeeter2020 8 hours ago

            pathologists are some of the highest paid doctors and they are right in the crosshairs of what AI is getting better at performing.

            • rscho 7 hours ago

              Do you really know what pathologists do ? Apparently not...

          • rscho 12 hours ago

            Haha. Have you actually ever seen a surgical robot yourself? Your claim is laughable. There is no automation whatsoever in any robot on the market currently.

        • ghc 12 hours ago

          Uh, pediatricians do a lot with their hands. I don't think my kids (or future grandkids) will be seeing an AI/robot doctor.

      • mdaniel 12 hours ago

        Also, a hallucination for 'SELECT mising_field FROM borgus_tuble' is one thing, hallucinating that taking a dose of Cl Na O along with CH3 CO2 H will cure covid is another thing entirely

        • sramam 12 hours ago

          This is so funny!

          However it can't even be called hallucinating. Imagine the incident "postmortem":

              But the AI was trained on White House press briefings
          
          Made my day...
        • sharpshadow 3 hours ago

          Is it really true that people drank bleach!? It always felt to me as some idiot did it once and it was repeated by the media endlessly, probably for clicks because this story is so dumb. Nonetheless the actual thing which people take is ClO2.

      • delfinom 12 hours ago

        Nope.

        Healthcare megacorps are buying up independent practices like crazy. All because doctors can't keep up with the bullshit IT required for insurance, state mandates, etc and that's in addition to the insanity of even renting commercial real estate for an office these days.

        These megacorps set quotas and push doctors to nickel and dime like crazy. They sure as shit will spend the money to find robots that can give you a prostate exam with a robot dildo.

        • mdaniel 12 hours ago

          Sounds good; if all these pro-AI folks could get it to complete the insurance paperwork that'd be swell. Actually, come to think of it, do that for the paperwork from both sides, doctor and patient, and eliminate and entire class of leaches upon humanity

          I'm going to laugh if DOGE eliminates the IRS, but also might be thankful

          • tyre 12 hours ago

            Join us at https://www.camber.health/ if you want to help fix this.

            We build software that automates insurance billing for clinics.

            And yes, the sentiment is correct that the burden of insurance encourages consolidation in healthcare. Wrapping that away (i.e. Stripe for healthcare financial infra) lowers the barrier to entrepreneurship.

          • rscho 12 hours ago

            Don't laugh too quickly, because what you describe is already happening: models are used to design processes allowing insurance corps to deny claims optimally, while on the other side models write your claims. If I were you, I wouldn't be laughing. If you are laughing, then you don't see where this is going to take us.

        • rscho 12 hours ago

          Except the tech to do that is not there, and we're quite far from it. It's one thing to have a robot write text, it's a whole other thing to have a robot perform at human level in medical procedures. Not happening tomorrow.

  • risyachka 8 hours ago

    Why would you assume that?

lvl155 12 hours ago

Looking forward to next few years when we can finally abstract away all the back-end techs.

  • threeseed 10 hours ago

    We've had this for at least a decade now.

    If you use a cloud provider there are managed solutions for data engineering pipelines.

    • m2f2 2 hours ago

      Sure, and it's not cheap.

  • BobbyJo 12 hours ago

    We ain't even solved garbage collection yet, and you think "back end systems" are going to abstracted away in the next few years?

    • purplerabbit 11 hours ago

      Maybe they just mean for the type of projects they care about

      • BobbyJo 10 hours ago

        Can't you already just use FaaS and managed persistence?

    • tarruda 12 hours ago

      > We ain't even solved garbage collection yet

      Can you elaborate on that?

      • BobbyJo 12 hours ago

        People still write in languages that force you to manage your own memory.

        Once performance starts to matter (either due to scale or time requirements) abstractions always have tradeoffs you can't accept.

        • pyrolistical 11 hours ago

          So then, how can garbage collection ever be solved if it’s a trade-off

          • BobbyJo 10 hours ago

            And how can backends be abstracted away if there is a trade off?

            As long as compute is a meaningful percentage of spend, the trade off will matter.

            • pyrolistical 9 hours ago

              Right. So what does it look like for garbage collection to be solved? You’re saying it’s not ever possible

              • BobbyJo 8 hours ago

                I am saying it's not possible for the foreseeable future, yes. The same way backends becoming an abstraction most developers don't need to worry about is also not going to happen in the foreseeable future.