To give an answer on a fairly concrete level―I don't know a lot of category theory, but one interesting insight I've gained from my limited study of it, which has convinced me that it's worth learning, is about the notion of products of structures.
Most mathematics probably acquire an informal understanding of the notion of the products of structures before they study category theory. In the simplest cases we can say that taking the product of some structures is just the act of going from looking at individual elements to looking at composite elements that are made up of parts, and applying operations to these composites by distributing those operations over those parts. For example the product of the real line with itself can be thought of as the set of ordered pairs of real numbers, and addition of such pairs can be carried out by making the ordered pair whose first projection is the sum of the first projections and whose second projection is the sum of the second projections.
This sort of understanding of the notion of product works for algebraic structures, and is formalized in the part of mathematics known as universal algebra (which is basically category theory applied to specifically algebraic structures). But when it's naively applied to other structures it doesn't always work so well. The example that I'm familiar with is topologies. The analogous way to form a product of some topologies is to say that open sets in the product topology are Cartesian products of the sets in the component topologies. This defines what's known as the box topology. But this notion isn't all that useful―it lacks many "nice" properties. For example, box topologies are not guaranteed to be compact if all their components are compact.
(Mathematicians talk a lot about "nice"-ness, but I've never seen that much discussion about what it actually means. So I don't know how far mathematicians will agree with the description of what "nice"-ness is that I'm about to give. But as far as I understand, an operation is "nice" to the extent to which interesting properties of the object formed as a result of applying the operation can be determined from the properties of the operands. The "nice"-er an operation is, the more it pays off to think of an object as made up of simpler operands that will yield the object upon applying the operation, because conclusions that can be drawn about the operands can be transformed into conclusions about the result. This is why it matters that the box topology is not "nice".)
It turns out that a much "nice"-er notion of product results if you define the open sets in the product topology as Cartesian products of open sets in the component topologies with the extra proviso that only finitely many of the open sets in the product are not equal to the whole underlying set of the component topology. With this notion of product topology, we have that products of compact topologies are compact, sequences of points in product topologies converge iff the corresponding sequences of projections all converge, etc.
From this, the question naturally arises: is there a general notion of product that will degenerate to the proper notion of the product topology (instead of the box topology) for topologies, while degenerating to the universal-algebraic notion of product for algebraic structures? And the answer is: yes―the category-theoretic notion of product. Basically, products in category theory are characterized by the property that the operation of taking projections on elements of a product is a morphism.
A morphism is just a member of an arbitrary collection of maps between structures that we regard as "structure-preserving" for the category of structures in question. For algebraic structures homomorphisms are normally regarded as morphisms; for topologies continuous maps are normally regarded as morphisms. But the choice of which maps to regard as morphisms is ultimately up to the mathematician; it depends what properties they are interested in the preservation of.
This is the big conceptual innovation of category theory: attention shifts from the structures themselves to the structures together with the structure-preserving maps between them. A composite object made up of a class of structures together with a class of structure-preserving maps between them is called a category. Hence the name category theory.
So this is where category theory comes from―it's a perfectly natural idea once one starts thinking about the relationships between different mathematical structures and abstracting out general notions of things like products and quotients. We get all the usual benefits of abstraction for insight-production: we can see the whole range of stuff to which our insight applies, rather than just having the insight for one particular thing within that range; and by thinking about things at the appropriate abstraction level we ignore extraneous specificities that would lead us down fruitless paths.
If you don't regard category theory in its full generality as an aspect of reality that's interesting in its own right, you can probably always about replace any reasoning about specific structures making use of general category-theoretic principles with reasoning that stays specific to the structure―however, the reasoning might be a lot more difficult, complex and cognitively inaccessible that way. Again, this is a general observation about the way in which abstractions provide insight into the specifics they abstract over.