Eugene Khvedchenya | Kaggle, Computer Vision & Best Code Practises | Severstal Steel, 4th Pos Sol


Sanyam Bhutani: Hey, this is
Sanyam Bhutani and you’re listening to “Chai Time Data
Science”, a podcast for data science enthusiasts, where I
interview practitioners, researchers, and Kagglers about
their journey, experience, and talk all things about data
science. Hello, and welcome to another
episode of the “Chai Time Data Science” show. In this episode,
I interview Eugene Khvedchenya, a Kaggle competition master
ranked in the top 150 on the platform, and a computer vision
consultant, talk all about his journey into the field and
Kaggle, and his recent team gold finish on the Severstal: Steel
Defect Detection Kaggle competition, where his team
finished fourth, bringing another gold medal to his
profile. Eugene also says many great pieces of advice around
writing code, good code, and specially good code for Kaggle
competitions, along with our discussion on his open source
efforts on Albumentations and another framework that he’s
currently working on, called Pytorch toolbelt. All of to
which you can find a description in the links section of this
podcast. Also, if you would like to know more about augmentations
there’s another interview with Dr. Vladimir Iglovikov on this
podcast. So do check that out in case you’re interested. For now,
here’s my interview with Eugene, all about Kaggle, his recent
solution and best coding practices. Please enjoy the
show. Along with a quick note and an
apology to the listeners, as you might have noticed, I have not
been releasing any interviews for the past two or three weeks.
And that is because I decided to invest a lot of efforts into
converting all of the previous 27 interviews on this series
into blog posts, along with working on putting a proper
‘data science term checked’ subtitles for them. I’ll also
make an announcement: starting now- Slowly, the blog post of
the previous 27 interviews will be rolling out in parallel, you
can find the link to the website with these will be released.
Along with all of the future video releases we’ll have
checked subtitles to help improve your experience. Instead
of working towards the releases, I spend some time putting up
support material for the previous ones and ensuring the
future releases happened fluidly, both in terms of the
video and audio stream along with blog posts. I hope that
improves your experience a bit. I am of course very open to any
suggestions. This is, this is a suggestion that I always wanted
to work on. So in case you have any other ideas that you’d want
me to incorporate, please do send them my way. And I hope
this improves your experience of tuning into “Chai Time Data
Science” a little bit. Without further ado, here’s the
conversation. Hello, Eugene, thank you so much
for joining me on the “Chai Time Data Science” podcast and
especially agreeing for the AMA Section. Eugene Khvedchenya: Thanks for
the invitation. It’s a pleasure to be here. Sanyam Bhutani: I’m honored to
have you on the show. Talking about your background, you have
a masters in software engineering and you have been
working on computer vision for a decade now. Could you tell us
how did you get started with data science or machine learning
broadly speaking? Eugene Khvedchenya: Sure. So my
background comes from the classical computer vision. And
this decade, I was doing computation on C++ or Python.
And in the start of 2018, I started investing more into the
deep learning and machine learning. Since I realized this
is a really hot topic, and what convinced me to do this, which
was understanding of the fact that the amount of efforts is
needed to build some image classification or object
detection system using deep learning techniques, is way
smaller than it required to do the same for the classical
handcrafted tradition. Sanyam Bhutani: Talking more
about the classical area, so you actually have a Hall of Fame, as
you mentioned it on your Kaggle profile. And you’re currently a
consultant in the computer vision area. Could you tell us
more about your off Kaggle life and the problems that you’re
working on? Eugene Khvedchenya: Sure. So
yeah, as you’ve seen on my profile, I was into many topics.
And currently, I’m working at a company for doing much identity
verification as a service. And there we have plenty of tasks.
But without going into the very deep details, which I think I
won’t be able to share, were using images to extract all the
information from them isn’t there, passport photos or driver
license photos. So it’s many topics, all related to computer
vision or OCR face detection. Sanyam Bhutani: Got it, and
you’re working on the deep learning stack or even the
classical algorithm. Eugene Khvedchenya: It’s a
combination of both. Sanyam Bhutani: Got it. Okay. Eugene Khvedchenya: So there’s
still a lot of room for the classical computer vision, I
think in many topics, in many areas, not only in our company,
but overall. So it’s good to have knowledge in, in both
domains. Sanyam Bhutani: During my
research, I also found three very interesting things on your
Instagram profile. I’ll have it linked for the listeners if they
want to check it out, which is sports, travel and graphic
cards, lots, lots and lots of them. Could you tell us more
about your life outside of the technical world and what sports
do you compete in and how do you balance this lifestyle? Eugene Khvedchenya: Uh so and
thanks for the question. For, to balance the work which is most
of the time sitting in the chair you have to compensate, this was
a decent amount of physical activity. Sanyam Bhutani: Yeah. Eugene Khvedchenya: So, I enjoy
cycling a lot. And recently I started doing open sea swimming,
which I also enjoy and in winter, all kinds of the winter
activities like snowboarding, that some fun off. So to to have
this work life balance or work-sport balance. I try to do
this through the year so that I keep myself in a good shape
since sitting eight or so hours, it’s not, it’s not something
that your body wouldn’t take a say, thank you. Sanyam Bhutani: Yeah. For the
audience. I’d also like to mention that Eugene doesn’t only
win medals on Kaggle he’s also won a bunch of medals on cycling
competitions, I think. Yeah, there was some of them in
my past and also in inline skating that, yeah. Got it. So you an active sports
competitor as well outside of Kaggle. Yeah, we can we can say that. Yeah. Okay. Coming to Kaggle.
Could you tell us what made you sign up for your first
competition and how did you end up getting started on Kaggle? Eugene Khvedchenya: It actually
was the same time I decided to learn deep learning. And I’m
kind of a practical kind of person who learns by solving
some real challenges and not very good at just learning the
theory. I need to get my, my hands dirty with some tasks. So
that’s why I went to Kaggle and at the time there was a,
icebergs the light detection challenge. It was the start of
2018 I think. So that was my first Kaggle competition. I was
totally newbie in the deep learning. And I tried to make my
way to this by reading the learning materials available in
a great amount on YouTube, some lectures, and basically building
mindset of knowledge and my set of skills. And that competition
I finished about on the 100 something place. Sanyam Bhutani: Right. Eugene Khvedchenya: And it was
the first time and the first lesson to me that I learned, is
that on the Kaggle look for data leaks first. Sanyam Bhutani: Hahaha. Okay. So yeah, in that competition,
there was a huge data leak that alone itself was able to get
about, I mean, top 50 maybe. Usually, ideally, that shouldn’t
be the case with most of the competition. Eugene Khvedchenya: It’s true,
it’s true, but if you’re coming, doing competitive machine
learning, it’s something they should always keep in mind. And
that was the lesson I learned from there and, and then I
continue to participate in all image related competitions. And
the, and the first gold medal we won as a team was a data science
bowl, 2018. Sanyam Bhutani: We just talk
more about your recent gold finish, but could you give us
another deeper idea of how do you approach learning materials?
For example, online courses, any of your favorites? Or when you
mentioned you’re experimenting along with learning any tips or
your part of how do you approach it? Eugene Khvedchenya: So, usually
I do the following. One there isn’t, let’s say new material or
new video posted by, let’s say, Jeremy. So I watch it either
just fully, or by chance, and then try to immediately follow
the idea and implemented if its implementation idea, right. So
as I said, I’m a practical person and without implementing,
it’s, for me, it’s useless. So I really like to implement
implement stuff, right code and by doing this I learned better I
would say, so not, not only learn as vanilla implementation,
how it’s written in some paper, but do your own small changes to
see what’s really the key of the article or implemented, so
what’s the crucial effect? That’s I think, what works best
for me Sanyam Bhutani: Got it. For the
audience. I’d like to mention that do check out the course by
Jeremy Howard. It’s called fast.ai if you’re not familiar
with it, it’s, it’s one of the best courses in my opinion as
well. Eugene Khvedchenya: Yeah, I
thought I found this course super useful to know this.
Know-hows, this really great, great source. Sanyam Bhutani: For sure. Coming
back to your Kaggle and now professional life, how do you
balance this Kaggle professional and sports lifestyle? How do you
balance your day and night and kaggle and work? Eugene Khvedchenya: So, the
answer’s simple, I usually have a couple of hours before I go to
sleep. Now that I spent on writing some experiment codes
and I just put it. Put my models, put my experiments to
training trick so while I sleep or working at the daytime, the
GPU funds are staying. Sanyam Bhutani: GPUs busy, hehe. Eugene Khvedchenya: Yes, GPUs
busy definitely. So this is something that allows me to do
work and other activities while the models training so or I
started training and then I go to the gym or cycling whatever.
And it doesn’t require you to be present. Sanyam Bhutani: Yeah. Eugene Khvedchenya: The model,
the model can, can be trained without your attention. And
that’s, that’s what’s helped me. Sanyam Bhutani: Got it. And do
you think after you started competing on Kaggle, and getting
these amazing results that I’m sure many of the audience is
familiar with, did it impact your professional life in any
way? Did you have any takeaway learnings that you apply to the
real world after that? Eugene Khvedchenya: For me, it’s
hard to overestimate how helpful competing on Kaggle was for me.
The amount of experience you get by going deep into particular
problem is it just a tremendous. And you learn new architectures,
you learn approach, you learn new ways to change your models,
you learn new frameworks, and this gives all you, ground to
perform better on your work or other pet projects. Sanyam Bhutani: Got it. Eugene Khvedchenya: From, from
this context, I would say it was really helpful to me to get into
the deep learning. Since I was I was able to use the knowledge I
got there in other places, Sanyam Bhutani: Coming to
frameworks. I’ve actually interviewed Dr. Vladimir
Iglovikov, about Albumentations to with your, core contributor.
So he’s already talked about his side of the story for the
audience, do check out that interview. But I’d also love to
know more about your side of how did you get started with the
framework and contributing to the idea. Eugene Khvedchenya: So just to
give your audience a bit of context, there are
Albumentations, is the image augmentation library that is
massively used on cargo for the image augmentations and I got
into the core team, by I adding a new functionality there, so,
which I actually wrote for some past competitions. So the
library itself was released, I think after Data Science Bowl,
2018. So, at that time, I had my own let’s call it mini
augmentation library. Sanyam Bhutani: Okay. Eugene Khvedchenya: That’s, then
I quickly throw to garbage and limitations. Since I found it,
really alined with my line of thinking how the image
augmentation could be done. And I started using it on a regular
basis in our competitions and on demand added new features there
and to allow others to use it so I made requests and at some
point guys propose it that I became one of the maintainers
and also help them on improving the library on some sort of the
regular basis. Sanyam Bhutani: Got it. So; Eugene Khvedchenya: This is,
this is how I got into the, yeah. Sanyam Bhutani: Your Kaggle work
also led you to becoming a better open source contributor
in a way. Eugene Khvedchenya: Indeed it
is. Before this, I haven’t considered to do this sort of
open source contribution about, I love it now. Yeah, it’s, it’s
good to realize that library is used and appreciated by
community. And it’s, it’s a great feeling. Sanyam Bhutani: For sure. Could
you tell us more about another framework that I think you
recently started, it’s called Pytorch Toolbelts. Eugene Khvedchenya: Yeah, it’s
currently just a product of mine. And at some point, I
realized that the code I write for the Kaggle competitions, it
doesn’t change a lot. And there are typical building blocks that
I tend to reuse over and over again, with a small
modifications, maybe, but the idea remains the same. And so it
took me maybe three or four challenges to realize I’m just
with, with my time on copy, pasting and recording, where I
change something called, where I introduced that feature, or fix
that bug. So essentially, I filtered all this blocks and
built library for just keeping them in one place, and so that I
can reuse them easily. Eugene Khvedchenya: And also, so
this is what’s, that was the Sanyam Bhutani: Got it. initial idea and it was tailored
for the Pytorch Library, which is my library of choice. And
then it became more and more mature and had a new features
like the stem augmentation that can be done on the GPU or a
flexible way of building models and switching the encoder
without having to rewrite your whole model. So all this
building books helps me to quickly iterate over ideas, that
I have in chat in competitions, and that was the purpose of this
library to be a toolbelt. So it gives you a tools so for; Sanyam Bhutani: I think this is
again, a quote from a previous interview that I’ve done, a
smart person back in the day used to have a large library of
books now smart people, like you just have open source libraries
that they keep contributing to. Eugene Khvedchenya: I think
it’s, it’s really effective way to first of all, become visible
to the community and also to offer what you have. And instead
you give some feedback or community needs. And this also
helps you to understand what actually is required. Or maybe
you’re missing some part of the puzzle. Sanyam Bhutani: Got it. Eugene Khvedchenya: From this
perspective, I think it’s a win win to everyone. And as I said,
it’s also good to understand that this small piece of the
open source code that he wr te is actually used by o Sanyam Bhutani: Yep. Do you have
any best advice for some newbie who’s looking to get started
open source contributions? Eugene Khvedchenya: I think
it’s actually really good to tr to get into the Google Summe
of Code or similar events by ot Sanyam Bhutani: Got it, we will
have Albumentations linked in er companies. And even for the A
bumentations, we have plen y of issues for the newc
mers. One can start with some hing, something really rela
ively simple, yet with its valu . So looking for the proj
ct that has this filler issu s. I think it’s easiest way to g
t into the open source. Eugene Khvedchenya: Yes, that’s
a good achievement, I think. And the description of this podcast
Do check that out in case you’ e interested. Now before we talk
all about your gold posi ion, and the fourth posi
ion finish, first of all, cong atulations to you on the amaz
ng finish. And also, you just broke into the top 150 rank
ngs of Kaggle competitions. I’m not going to stop here. Sanyam Bhutani: So maybe by the
time the audience gets to hear this, you might have a better
rank than this. Eugene Khvedchenya: Yeah. Sanyam Bhutani: Coming to the
competition, it’s, it’s a tongue twister, but it’s called
Severstal steel defect detection Kaggle competition, could it
help me set the stage by telling us what the challenge was in
this competition? Eugene Khvedchenya: The
challenge was in the detection. The steel defects that appear
when the steel, it gets forged and the problem statement was
that when I was asked to build a system that would do a semantic
segmentation of Steel sheets, I think made by special kind of
the camera that produced grayscale images of 1600 pixels
with 256 pixels height. And on this image, we had to detect for
classes of the still defects and produce a mask indicating where
each particular defect class is. The main challenge I think, for
this competition was the extreme class imbalance and in the
training data, look, it has maybe one just percent of one
prior class, and another class dominated heavily into training
samples. And there was roughly half of the train data was out
defects the top quarter. Sanyam Bhutani: Got it. Eugene Khvedchenya: And the
second, the second challenge in for this competition was it was
a synchronous kernel on the competition, meaning that you
have a limited amount of time limit to do the inference on the
train set that is unseen, you cannot see the real test. They
don’t. And it’s actually great, I think, and this puts all the
team in the same conditions. So you cannot build the craziest
ensemble of the 50 models have just unreasonable inference that
I think it’s really good to have more and more of such challenges
on the Kaggle. So, these these are two main challenges on, I
think that, that the participants was, so. Sanyam Bhutani: Could you tell
us once you got interested in the competition and discovered
the challenges of it, what were your first go to steps and how
did you approach this problem? Eugene Khvedchenya: Mostly most
of the time, my approach is similar and I write, some, I
call it fit predict script that establish my baseline, for any
any kind of problem that it is. And the idea here is to ensure
that the training loop itself the data preparation, is back
free. So I can provide a reasonable model and I can do
submission in the format that is required. So kind of make the
baseline this is, the perspective. Sanyam Bhutani: Mhm, okay. Eugene Khvedchenya: Don’t, don’t
try to get silver bronze gold from the first submission. Goal
is to ensure you have a solid baseline ,that you have, the
court itself is clean. So that even if you let’s say put it for
one month on ice, do something else and then go back you still
can understand what’s going on. So, I often hear from others
like it’s a, it’s a competition, we have no time to make our
court claim. Sanyam Bhutani: Yeah. Eugene Khvedchenya: I
respectfully disagree with the statement. The quality of the
code that you write is really important for the Kaggle as
well. It’s maybe, even more important than the quality of
what you’re doing for your just everyday job. Because on Kaggle
you have a very limited amount of time. Sanyam Bhutani: Yeah. Eugene Khvedchenya: And the cost
of the mistake, if they happen to have your code, is multiplied
since it’s the most, it’s usually the difference between
winning and losing. Sanyam Bhutani: Mhm, that’s a
great point. Eugene Khvedchenya: And if you
can avoid doing mistakes by using well structured code,
static code checking out of formatting or ID features to
help you spot and fix the issues, why, why not use it? Sanyam Bhutani: For sure. Eugene Khvedchenya: Yeah, so
that that’s why my, my suggestion is to have a strong
baseline. Have a clean baseline, discipline code that you are
100% Sure. It’s it, don’t have issues and build your way from
there. Sanyam Bhutani: I think it also
leads to amazing libraries eventually, maybe if you keep
doing it consistently for a while, you might end up with
something similar to augmentation or the toolkit. Eugene Khvedchenya: Sure, sure.
So I’m competing on the Kaggle, almost two years from now, and I
have some sort of templates for any kind of problems, could be
either image segmentation or object detection or
classification. And I think pretty much, pretty much
everyone already have something similar to it. And to me, the
key idea I try to follow is write as less code as possible.
So if you, if you can use existing deep learning
framework, there is no reason to use, use in which albumentations
library without re-inventing the field, or reuse, whatever source
you find confident and well maintained, it is worth using
this despite write less code by yourself or in there, you reduce
the chance of making a mistake. Sanyam Bhutani: Again because
you’re under the time constraint as well. So you can invest your
time better if you’re using pre written code. Eugene Khvedchenya: Definitely,
definitely, even if it’s, if this code written by is yours
and, for some past competition, it’s good idea just to keep, to
copy paste it and just result, from writing from scratch.
Again, again put us into requirement to have gold. Sanyam Bhutani: Yeah. Did you
find any of your previous code being useful in this steel
detection competition? Eugene Khvedchenya: Yes, true.
And actually, I, I refer to it and to make recent PyTorch
toolbelt released exactly for this competition. Sanyam Bhutani: Okay. Eugene Khvedchenya: So that last
function and metrics are supported, supporting the steel
challenge. Sanyam Bhutani: Interesting. Eugene Khvedchenya: I constantly
reuse the pieces of the code from my past competitions, in
the ongoing. I think it’s, this is how it should be. Most of the
time, the problem can be narrowed to three big classes,
is a classification problem, or it is a segmentation or the
object detection. Sanyam Bhutani: Yep. Eugene Khvedchenya: So
essentially, you just need three solid pipelines that you then
can extend to that particular problem. Sanyam Bhutani: For sure. Eugene Khvedchenya: It’s, it
sounds quite easy, right? But without going into very deep
details, it’s, difficult. Sanyam Bhutani: So definitely,
as a takeaway for me and the audience. Please do spend some
time writing clean code Can you tell us more about your team for
this particular competition? And how do you in general, maybe
pick your teammates and distribute the workflow? How do
you track your experiments or come up with ideas when working? Eugene Khvedchenya: Sure. So in
this competition, we have a team of three of us. But distinction
code Demtris, __ and I were from the same city. And guys has been
working on this steel competition at the beginning,
and I joined it not maybe, before months before the end,
since I was busy in aptos blidness detection comp Sanyam Bhutani: Wait I think
you’re gold medal in that competition. Eugene Khvedchenya: Yeah, and
uh. Sanyam Bhutani: Solo gold medal. Eugene Khvedchenya: So that we
can speak about it later. Sanyam Bhutani: Yeah. Eugene Khvedchenya: So in Steel,
our guys had plenty of resources in the cloud, and they offer it
to join and join our forces on. So initially, when I jumped I
didn’t have much enthusiasm or kind of any goal, that’s has
been said beforehand. So I was thinking that it would be good
if we secure silver. So there is an agenda, the competition was
that I like the image segmentation. It’s probably one
of my best topics. And I wanted to be able to see what, what’s
in the steel what’s data into how would it work. So that’s why
I join it and I really quickly make the baseline and when does
baselines scroed really, really down on the leaderboard may be
something like 1000 place. Sanyam Bhutani: I see. Eugene Khvedchenya: I get
interested, like, what’s happening why I’m not able to
get the bronze. Sanyam Bhutani: You took up the
challenge. Eugene Khvedchenya: Yes, the
challenge has been accepted. Sanyam Bhutani: Hahaha. Eugene Khvedchenya: So I started
training more and more models there. And at some point I get
closer to the bronze and at this time I was already interested.
So how, how it could be and then we try, we merged with Debris
and Dimitri and shared our ideas and the code. And regarding your
question on how we check experiments, for me what works
quite well is the document on Google or into ReadMe, some sort
of table where I write, I write in the historical order, like
experiment name, what the score wasn’t the validation, what the
score wasn’t the leaderboard, some additional comments of what
experiment is about. So this kind of, some sort of notebook
helps me to keep in mind what I’m, actually already tried and
what I have not tried yet. Sanyam Bhutani: Okay. Eugene Khvedchenya: So usually
at the beginning of the competition, when I look at the
data, I make a set list of ideas I would like to try over after
have a baseline. Let’s say I can write something like okay, I
want to try so the labeling or I want to try some interesting
clothes functions, or there’s a new paper. I want to try. For
me, each new each new competition is an opportunity to
try something new. That’s, that’s my, the reason why I
participating in every competition, I want to learn. I
want to try new things. And it’s an opportunity to me. Sanyam Bhutani: There’s a
prevailing sentiment as well, that you need quite a bit of
compute power for gold winning solutions. Could you talk to
that a bit, and maybe also share your setup for this competition. Eugene Khvedchenya: So in this
competition, also, I personally all four GPUs, 1080Ti, which,
for someone, it’s maybe a lot done right. But if you took a,
let’s say a winning solution of the open images of the detection
already supports a second place, but they used 120 GPU Sanyam Bhutani: Completely
unimaginable, at least for a person like me. Eugene Khvedchenya: Right? For
me as well, and a lot is not very clear thing, right? So I
would say, for the deep learning the minimum amount of GPUs to
have is one. Sanyam Bhutani: For computer
vision at least. Eugene Khvedchenya: It’s
sufficient to start and I started with one GPU, and it’s
perfectly fine to, to learn and to compete with others,
especially since you can use Google Cloud Credits or the
Collaboratory notebook which gives you a decent GPU or even
Kaggle. Sanyam Bhutani: Yeah. Eugene Khvedchenya: There are
time limits on Kagggle kernels, but it’s still better than zero. Sanyam Bhutani: For sure. Eugene Khvedchenya: So I can
agree that having more GPUs is definitely helpful then having
none or one and for me the reason of getting four GPUs is
that so there are two reasons actually. Sanyam Bhutani: Okay. Eugene Khvedchenya: The first
one, it allows me to iterate faster on my ideas or my
learning. I’m not speaking right now on Kaggle only I’m, I’m
working remotely and I can build my own PC, which is working too. Sanyam Bhutani: Okay. Eugene Khvedchenya: And I want
my working tool to be as fast as possible and as efficient as
possible. Sanyam Bhutani: Yeah. Eugene Khvedchenya: So if we put
this in this context, it’s totally makes sense to; Sanyam Bhutani: Definitely. Eugene Khvedchenya: Invest,
invest in your hardware. Sanyam Bhutani: Got it. Now
coming to experimentation, as you Machine Learning specially
20 to 30 experiments usually don’t work maybe more for Kaggle
competitions. Can you speak to that a bit? How do you track
your failed experiments and note the takeaways from them? Eugene Khvedchenya: Yeah. So,
even 20 or 30 experiments is I would say to resemble in some
past competitions article for me, I ran more than hundred
experiments. Sanyam Bhutani: Okay. Eugene Khvedchenya: And tracking
then becomes really cumbersome. Tons of work, too. So, and also
making notes on the experiments that went well and which went
didn’t good, is a crucial and also what I’m trying to do is
use a get a history and the branch. Just to actually keep
each experiment more or less isolated. Sanyam Bhutani: Okay. Eugene Khvedchenya: So let’s say
I have my main development branch, and if some experiments
lead to success, it gets merged to the main branch so that I can
keep the structure of the repository clean, yeah. Sanyam Bhutani: So when an
experiment fails in Machine Learning, for example, you’re
modeling, how do you decide to continue giving it another try
or putting an end to it? Because in machine learning the
experiment doesn’t work until it does. Eugene Khvedchenya: Yeah. So I
can you, give me, I can give you some examples in the steel
challenge that I tried. So I had an idea that if I do, very
interesting trick on a mature mutation, that it may help my
model to generalize better, the trick was to use well known
mixed up augmentation, but do it slightly differently. There is a
technique on seamless image blending that allows to
integrate one portion of an image into another one without
any noticeable to Nick a boundary. Sanyam Bhutani: Okay. Eugene Khvedchenya: And my idea
was okay, we have masks. So for the ground consolidation, so
what why not we take the random defect from one image and put it
into a second image, so that it would look natural. And I rolled
this implementation, and it provided quite good results. I
mean, visually, it was really not easy to say that it’s
artificially added the fact that and then they put the training
of the model using this augmentation. So and I was
hoping it will show rocket high score on the validation and the
readable. And, it did not. Sanyam Bhutani: Hehe, okay. Eugene Khvedchenya: It did not
at all. So at this point, I could either continue debugging
this technique or try something else. And it’s always a decision
you have to make. So, what I usually do is I write unit tests
for my augmentations or for other things that I potentially
can fail and do unit tests will show you that I did okay with
documentation itself. And the visual inspection also didn’t
reveal any issues. So I ditch that idea that because we had
really a limited amount of time, so you have to be, you have to
carefully choose where you spend your time. Sanyam Bhutani: My takeaway is
also that since you’ve written some unit test, again, from your
strong software experience so after testing, if the idea
doesn’t work, maybe it’s some time to put an end into it. Eugene Khvedchenya: Yeah. And
this is, this is how you perform in Kaggle you have to try as
many ideas as possible and some one may work, some ones will not
work and it’s okay. Yet some how you will have a box and this,
it’s also okay. But you have to keep going and keep trying as
many ideas as you can, within a limited amount of time. Sanyam Bhutani: Coming to the
final solution of your competition could you please
maybe give us a high level overview of the gold winning
solution. And I’d also really love to know what led you to the
86 position jump on the private leaderboard, that’s also
question by Thio Viel from the AMA Section. Eugene Khvedchenya: Yeah, so our
final solution was an ensemble of five models, which was using
stem augmentation of vertical and horizontal flips. And older
prediction has been leveraged by a simple mean average and was a
threshold of zero 55 we end up with a binary masks. So it
sounds quite a simple solution and model architectures we have
used it at work, Dense net 201 resnet 34, SEResNet 50 and a
plain ResNet 50. So the final insurance plan was about 15
minutes. So, so we; Sanyam Bhutani: Sorry, 15
minutes on kernels? Eugene Khvedchenya: Yes. Sanyam Bhutani: Okay. Eugene Khvedchenya: Yeah. So the
total influence then I was a 15 minutes. So we are way below the
one hour req. Sanyam Bhutani: Yeah. Eugene Khvedchenya: And actually
we change three more models. But combination of these five had
the higher accuracy, a higher score on validation on the
leaderboard, so we end up with it. Apart from this, we turn it
into much heavier models like high-high resolution net, or
ResNet 152 and was a different kind of the decoders still
produce more. But unfortunately, they didn’t work quite well in
this competition. And mine, I can only speculate why we jumped
so high. But my understanding is that first of all, we didn’t
really rely on things like simple blending, and just showed
adjustment. Sanyam Bhutani: Okay. Eugene Khvedchenya: So I think
combinations of two facts is what helped us, since the dice
metric itself, if you look at the leaderboard, it’s, it was
really close. So, and the naked eye, the difference between
dice, zero point 91 and zero point 92 is, it’s nothing. Sanyam Bhutani: Hehehe. Eugene Khvedchenya: So it’s
essentially counting number of pixels between the different
predictions what’s put your higher. So I think, I think the
fact that we didn’t use to the leveling worked for, for us and
many of the teams was really adjusting to short to get high
on data port. In fact we did we did not end, we try to adjust
our cross validation. Sanyam Bhutani: Could you maybe
speak a bit about cross validation skills that you
picked up over time and any tips for that? Eugene Khvedchenya: Uhm, sure.
So, for this competition, the validation scheme that I
personally used was a stratified multi label cross validation.
And this means that I tried to balance folds. So that the ratio
of the combination of each defects is saying between the
chain and validation. So in total I had a four folds for
this and Dimitri and Maurice had different validation scheme but
similar, so when we average our models it’s also was working
well because they we have some that divulge a model predictions
here. What is what is, what is funny when we train it higher
resolution at work which is was really heavy and it took maybe
almost two days; Sanyam Bhutani: On your setup? Eugene Khvedchenya: No on eight
v100 Sanyam Bhutani: Okay. Eugene Khvedchenya: So it was
like burning a lot of money but we, I had really high hopes for
this network since It claims to have state of doubt performance
on the Segmentation. And it didn’t work anywhere close to
look what we have was the resnet 34. Sanyam Bhutani: Got it! Eugene Khvedchenya: So, this is
Kaggle and; Sanyam Bhutani: Yep, its Kaggle. Eugene Khvedchenya: So it’s, it
still can be the case that I have implementation problem in
the, in the code, but this is how it can be, so. Sanyam Bhutani: Yeah. So this
has been a great interview. My final question to you would be,
what best advice is, maybe a single or multiple would you
have for someone just started or just getting started on Kaggle
or machine learning broadly speaking? Eugene Khvedchenya: For me,
using Kaggle for learning is great, great idea. And I see no
reason why you shouldn’t do Kaggling to practice your
skills. It allows you to get into community, speak to others
share ideas, get some tips or mentoring and learn from the
public kernels. It’s a great way to go in to get into the machine
learning. So, keep going to competing, through Kaggling or
there plenty other sources on the competitive machine non
competitive platforms, not only Kaggle. So I think it’s the
right place if you have this; Sanyam Bhutani: Graphic Cards Eugene Khvedchenya: No, no
graphic Cards. And to helpful know, but I mean, if you are
competitive, Sanyam Bhutani: Okay, got it. Eugene Khvedchenya: So I can
imagine that it’s not in everyone’s nature to try to push
it’s limits, right? Sanyam Bhutani: Yeah. Eugene Khvedchenya: So if that’s
something that’s resonate with your nature, then yeah, go for
it. If If not, if you don’t like it, or if you don’t like to work
in the time constraint conditions, it’s probably not
the best way for you to learn. And you have to find your way
what’s, what about, what was, what was best for you, but for
me, it’s definitely the way how I like. Sanyam Bhutani: Awesome, what
would be the best best platforms to follow you and follow your
work before we end the interview. Eugene Khvedchenya: Okay, so I
think you’re, can reach me on Kaggle forums and also on the
Open Data Science slack. And feel free to reach me on my
email, you can share it afterwards. So I’m open to any
questions and you can ask me whatever question you have
regarding computer vision or machine learning. Happy to
answer. Sanyam Bhutani: That’s really
kind of you. Thank you so much, Eugene, for joining me on the
podcast and thank you so much for all of your open source and
Kaggle contributions. Eugene Khvedchenya: Thank you
very much for the invitation and see you on Kaggle. Sanyam Bhutani: Thank you so
much for listening to this episode. If you enjoyed the
show, please be sure to give it a review, or feel free to shoot
me a message you can find all of the social media links in the
description. If you like the show, please subscribe and tune
in each week to “Chai Time Data Science.”

1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *