Extracting Data from a Column: 4 Solutions 1 Problem


When we’re working with Excel and data
what is the bottom line? Because we are going up against crap data,
something is amiss. So what’s the bottom line? VICTORY. Ending confusion. Ending tihsllub. So, to stay battle-ready it helps to do drills and that’s what we’re gonna do
in this video. I’ve got some crap data and I am gonna show you 4 different
ways to defeat this. That’s the drill: coming up with different ways to do the
same thing. And I invite you to think of other ways to defeat it because we need
a repertoire. Right? When crap data comes out us one way, we’ve gotta have different
ways of dealing with it. The data I’m going to show you represents something I
dealt with yesterday. I got some data off of a webpage and when I pasted it into
Excel it went from being nice and neat to be in one big stack. Are you ready to
look at the challenge? Here’s this tall stack of data and I want to know how
many 1st, 2nd and 3rd connections are in this list. Now, Indigo is a 1st
connection and Indigo is 39. And then has five favorite colors. We get down to
Dovie. Dovie is a 1st connection also. 20 years old. Only has one favorite
color. The list goes all the way down to Gaad Turner. Four favorite colors. 3rd
connection. 55 years old. We don’t want to ages. We don’t care
about the favorite colors. We want to keep the names. But we want a tally of
the first second and third connections. Look at the data. Think about how you
would peel this down. Take a moment. Method 1. The most manual painful way.
But it would get you a victory. Start at Indigo. Put an X. Go down to Flavius. Put
an X. Dovie. X. Zoe Bernie. Gaad. And then we could sort. We’re
in the data tab. I’m gonna sort Z to A. All of our X’s should be at the top.
Method 2. Equals ISNUMBER. Is this number here–to the bottom and then to the left–
is that a number? Close parenthesis. Enter. It is! Okay. Now we’re gonna wrap an IF
statement around that. If that ISNUMBER equals true then, do a RIGHT, open
parentheses, on this cell. Comma grab the rightmost three. Close parentheses
otherwise stay empty. Close parenthesis enter. Double-click. All of our first
second third connections are there. We could copy, paste-as-values, sort, tally
what we want to tally. Method 3. Advanced filter. In order to use
advanced filter we need an area for our criteria. I’m gonna unhide here. I want to
keep anything in column C that has ‘Connection.’ Here we go.
My cursor over here in this data set. Advanced filter. Copy to another
location. My criteria range is F1:F2. Where do I want to copy the list to? I
want to go over here. Okay. There are all of the names. I’m gonna double-click. Use flash fill to dig this out first. Third. flash fill is helping us out. I
want to do this connection okay and then insert a pivot table. Insert pivot table.
New worksheet. I’m gonna put connection in rows connections in values. Method 4. I’m gonna highlight this column.
Conditional formatting. Cell rules. Text that contains. “Connection.” Okay. Next. Data.
Sort. Sort on cell color. Which cell colors do I want on top? Pink. OK. So,
those are four basic ways of doing the same thing. We didn’t even use power
query to do this. You could come up with your own power query solution. Do that!
All right? That’s the challenge. And remember. When you open Excel, you tell Crap Data: “the tihsllub stops today.”

14 Comments

  1. Wow, method 3 problably the most unsused of all! Amazing stuff
    can be with Advanced Filtering! I love the examples on your
    book Guerrilla Data Analysis (2nd Edition"

  2. Hi Oz.. nice challenge! I like your solutions. For those with Dynamic Arrays, I think this would work (on your data set, assuming it does not go beyond row 1000): =FILTER(C1:C1000,ISNUMBER(SEARCH("Connection",C1:C1000))). It produces a spilled list of all the items containing the word "Connection". You can take it from there in terms of digging out 1st, 2nd, 3rd and then using COUNTIF or a Pivot Table to get the count of the various connections. Great stuff! Thanks for the inspiration. Thumbs up for Excel On Fire!!

  3. Hi sir,
    =IFERROR(RIGHT(INDEX($A$2:$A$25,AGGREGATE(15,6,ISNUMBER(SEARCH("Connection",$A$2:$A$25))/ISNUMBER(SEARCH("Connection",$A$2:$A$25))*(ROW($A$2:$A$25)-ROW($A$2)+1),ROWS($B$2:B2))),3),"")
    May be we can use if and small functions to extract the row number,
    even we can try offset with above row extraction methods.

  4. This is the wild one..,,
    {=RIGHT(INDEX($A$2:$A$25,SMALL(MODE.MULT(IFERROR(MATCH(ROW($A$2:$A$25)-ROW($A$2)+1,ISNUMBER(SEARCH("Connection",$A$2:$A$25))*(ROW($A$2:$A$25)-ROW($A$2)+1),{0,0}),"")),ROWS($G$2:G2))),3)}

  5. Tihsllub defeated! This problem screams power query to me. Conditional column if connection. Remove rows. Split column for right 3. Then the world is your oyster – tidy up or group by to get the layout you need. Re-useable solution. Time for a cuppa tea.

  6. Great video!! If just want to tally, could in G5 enter =countifs(C2:C1000,"*"&F5&"*") where F5=1st, F6=2nd, F7=3rd. Then copy countifs formula in G5 to G6 and G7. Bingo. Thanks for the brain teaser.

  7. Just discovered your channel — I'm blown away! I use free and open source spreadsheet software (e.g. Open Office and Libre Office), so some of the functionality differs, but I love your approach regardless. I suspect you'd be deadly with Python — ever play around with it? Seeing you bring light and learning with generous spirit, humour and creative pedagogy is awesome. Respect!!! If you're ever up in Canada, please let me treat you to a chocolate milk.

Leave a Reply

Your email address will not be published. Required fields are marked *