In case of traditional thread pool there exists a single queue for work items. These work items are accessed from the queue by each thread in the thread pool when it asks for work. This causes contention as all threads for each core are competing for this shared resource (the work queue). The TPL thread pool was designed to minimize this contention. Instead of having just one queue, TPL thread pool has a queue per logical processor. As there can only be as many physical threads as number of cores, each thread works on it own queue and is virtually contention free.
A concept of work stealing was introduced. If a thread's queue is empty it "steals" work from other thread queues. Work stealing is expensive but it doesn't happen most of the time. Stealing happens from the back end of the queue as it is least likely to have data sitting in the cache for its logical processor.