Isang magandang halimbawa ng parallel processing ay iyung SCSI type na hard disk na parallel iyung data lines. Sa mga server farms, SCSI ang popular dahil maraming tao uma access sa data ng sabay sabay. Kung serial type (USB o IDE) malamang sumemplang na iyung server. Maganda lang ang serial hard disk sa personal na gamitan.
Parallel po ang ATA (aka PATA / IDE). SATA is Serial, and SCSI can be either Parallel or Serial depending on variant. In fact, the more popular SCSI variants today are serialized interfaces (SAS).
Ang speed ng parallel at serial should be the same, if we're talking about the same data rate. Kung 24/192 (or 16/44 or kung ano man) na parallel or serial, pareho lang ang data na kinakain nila iba lang ang feeding method. There might be certain advantages and disadvantages for each side sa conversion part, but it's not speed/data rate related.
Sa mga current CPU, madalas na ang multi-core ngayon since they're trying to hit a sweet spot for performance versus thermals, form factor and cost (engineering constraints). When you look at their architecture, each pipeline of each core inside the multi-core processor is already very serialized (an instruction is broken down and handled through multiple stages to enable higher clock/data rates). Maraming debates diyan about deep pipelines and high clock rates, versus shallow ones and lower clock rates (and how branch prediction and prefetch/read ahead affect the design's performance and the resulting demands on other components). May factor din kung ano ang ipaparallel na pipelines within a core, kung pare pareho or specialized. Related din yung level ng multithreading (to optimize for multicore designs) that is worthwhile for application optimization as they also interfere because they still access common resources. Processors are far more complex than DACs, so I don't think they're even comparable.
In laymans terms, sa mass market use, nauso ang multicore processors dahil the designers have started to reach the limits of highly serial designs (those serial designs still had some parallel things internally) with the current manufacturing technology. What they did is to run those serial designs in parallel to raise maximum IPC (in most cases, it is still largely serially processed per core, distributed to parallel pipelines, which is then serially processed in each pipeline in appropriate data units).